Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

At any given time, the state of the endpoint can be one of the following:

State

Description

Active

Endpoint is running and handling requests.

Timeout

Endpoint encountered an error but can still send and receive messages. If it continues to encounter errors, it will be suspended.

Suspended

Endpoint encountered errors and cannot send or receive messages. Incoming messages to a suspended endpoint result in a fault.

OFF

Endpoint is not active. To put an endpoint into the OFF state, or to move it from OFF to Active, you must use JMX.

Active
Anchor
Active
Active

When WSO2 Enterprise Service Bus starts, endpoints are in the "Active" state and ready to handle messages. If the user does not put the endpoint into the OFF state, it will be in the "Active" state until an error occurs.

...

When an endpoint is in the "Timeout" state, it will continue to attempt to receive messages until one message succeeds or the maximum retry setting has been reached. If the maximum is reached at which point , the endpoint is marked as "Suspended." If one message succeeds, the endpoint is marked as "Active."

...

Code Block
<address uri="endpoint address" [format="soap11|soap12|pox|get"]
    [optimize="mtom|swa"] [encoding="charset encoding"]
    [statistics="enable|disable"] [trace="enable|disable"]>
	<enableRM [policy="key"]/>?
        <enableSec [policy="key"]/>?
        <enableAddressing [version="final|submission"] [separateListener="true|false"]/>?

        <timeout>
                <duration>timeout duration in seconds</duration>
                <responseAction>discard|fault|fault<never</responseAction>
        </timeout>?

        <markForSuspension>
                [<errorCodes>xxx,yyy</errorCodes>]
                <retriesBeforeSuspension>m</retriesBeforeSuspension>
                <retryDelay>d</retryDelay>
        </markForSuspension>

        <suspendOnFailure>
	        [<errorCodes>xxx,yyy</errorCodes>]
                <initialDuration>n</initialDuration>
                <progressionFactor>r</progressionFactor>
                <maximumDuration>l</maximumDuration>
        </suspendOnFailure>
</address>
"Timeout" settings
Anchor
timeoutSettings
timeoutSettings

Name

Values

Default

Description

duration

Miliseconds

60000

Connection timeout interval. If the remote endpoint does not respond in this time, it will be marked as "Timeout."

responseAction

discard, fault,

none

never

none

never

When a response comes to a timed out request, specifies whether to discard it or invoke the fault handler.

If none

 If you select "never", the endpoint remains in the "Active" state.

"MarkForSuspension" settings
Anchor
markForSuspension
markForSuspension

Name

Values

Default

Description

errorCodes

Comma separated list of error codes

101504, 101505

Errors that put the endpoint into the "Timeout" state. If no error codes are specified, the "HTTP Connection Closed" and "HTTP Connection Timeout" errors are considered "Timeout" errors, and all other errors put the endpoint into the "Suspended" state.

retriesBeforeSuspension

Integer

0

In the "Timeout" state this number of requests minus one can be tried and fail before the endpoint is marked as "Suspended". This setting is per endpoint, not per message, so several messages can be tried in parallel and fail and the remaining retries for that endpoint will be reduced.

retryDelay

milliseconds

0

The time to wait between the last retry attempt and the next retry.

'suspendOnFailure' settings
Anchor
suspendOnFailure
suspendOnFailure

Name

Values

Default

Description

errorCodes

Comma separated list of error codes

All the errors except the errors specified in markForSuspension

Errors that send the endpoint into the "Suspended" state.

initialDuration

milliseconds

30000

After an endpoint gets "Suspended," it will wait for this amount of time before trying to send the messages coming to it. All the messages coming during this time period will result in fault sequence activation.

progressionFactor

Integer

1

The endpoint will try to send the messages after the initialDuration. If it still fails, the next duration is calculated as:

Min(current suspension duration * progressionFactor, maximumDuration)

maximumDuration

milliseconds

Long.MAX_VALUE

Upper bound of retry duration.

Sample Configuration:

Code Block
<endpoint name="Sample_First" statistics="enable" >
    <address uri="http://localhost/myendpoint" statistics="enable" trace="disable">
        <timeout>
            <duration>60000</duration>
        </timeout>

        <markForSuspension>
            <errorCodes>101504, 101505</errorCodes>
            <retriesBeforeSuspension>3</retriesBeforeSuspension>
            <retryDelay>1</retryDelay>
        </markForSuspension>

        <suspendOnFailure>
            <errorCodes>101500, 101501, 101506, 101507, 101508</errorCodes>
            <initialDuration>1000</initialDuration>
            <progressionFactor>2</progressionFactor>
            <maximumDuration>60000</maximumDuration>
        </suspendOnFailure>

    </address>
</endpoint>

...

For more information about error codes, see Error Codes.

Anchor
retryConfig
retryConfig

Configuring retry

...

Disabling endpoint suspension

If you do not want the endpoint to be suspended at all, you can configure the Timeout, MarkForSuspension and suspendOnFailure settings as shown in the following example.

Code Block
languagehtml/xml
<endpoint>
 <endpoint name="NoSuspendEndpoint"> 
       <address uri="http://localhost:90019000/services/LBService1SimpleStockQuoteService"> 
       <retryConfig>    <timeout> 
      <disabledErrorCodes>101503</disabledErrorCodes>     </retryConfig>   </address> <<duration>30000</endpoint>duration> <endpoint>
  <address uri="http://localhost:9002/services/LBService1">     <retryConfig>       <enabledErrorCodes>101503</enabledErrorCodes><responseAction>fault</responseAction> 
           </retryConfig>timeout> 
           </address>
</endpoint>

...

<suspendOnFailure> 
               <errorCodes>-1</errorCodes> 
               <initialDuration>0</initialDuration> 
               <progressionFactor>1.0</progressionFactor> 
               <maximumDuration>0</maximumDuration> 
           </suspendOnFailure> 
           <markForSuspension> 
               <errorCodes>-1</errorCodes> 
           </markForSuspension> 
       </address> 
   </endpoint>


Configuring retry

You can configure the ESB to enable or disable retry for an endpoint when a specific error code occurs. For example:

Code Block
languagehtml/xml
<endpoint>
  <address uri="http://localhost:9001/services/LBService1">
    <retryConfig>
      <disabledErrorCodes>101503</disabledErrorCodes>
    </retryConfig>
  </address>
</endpoint>
<endpoint>
  <address uri="http://localhost:9002/services/LBService1">
    <retryConfig>
      <enabledErrorCodes>101503</enabledErrorCodes>
    </retryConfig>
  </address>
</endpoint>

In this example, if the error code 101503 occurs when trying to connect to the first endpoint, the endpoint is not retried, whereas in the second endpoint, the endpoint is always retried if error code 101503 occurs. You can specify enabled or disabled error codes (but not both) for a given endpoint.

...

With leaf endpoints, if an error occurs during a message transmission process, that message will be lost. The failed message will not be retried again. These errors occur very rarely, but still message failures can occur. With some applications these message losses are acceptable, but if even rare message failures are not acceptable, use the failover endpoint.

Here is the configuration for failover endpoints. At the configuration level, a failover is a logical grouping of one or more leaf endpoints.

...

Some errors put the endpoint into Timeout and some keep the endpoint in the Active state. In these cases, the retry can happen using the same endpoint. If the failure occurs with the first endpoint within the failover group and this error does not put the endpoint into Suspended state, the retry will happen using the same endpoint.

...

Code Block
<endpoint name="SampleFailover">
    <failover>
        <endpoint name="Sample_First" statistics="enable" >
            <address uri="http://localhost/myendpoint" statistics="enable" trace="disable">
                <timeout>
                    <duration>60000</duration>
                </timeout>

                <markForSuspension>
                    <errorCodes>101504, 101505, 101500</errorCodes>
                    <retriesBeforeSuspension>3</retriesBeforeSuspension>
                    <retryDelay>1</retryDelay>
                </markForSuspension>

                <suspendOnFailure>
                    <initialDuration>1000</initialDuration>
                    <progressionFactor>2</progressionFactor>
                    <maximumDuration>64000</maximumDuration>
                </suspendOnFailure>

            </address>
        </endpoint>
    </failover>
</endpoint>

Here the Sample_First endpoint is marked as Timeout if a connection times out, closes, or sends IO errors. For all the other errors, it will be marked as Suspended. When this error occurs, the failover will retry using the first non suspended endpoint. In this case, it is the same endpoint (Sample_First). It will retry until the retry count becomes 0. The retry happens in parallel. Since messages come to this endpoint using many threads, the same message may not be retried three times. Another message may fail and can reduce the retry count. 

Info

The retry count is per endpoint, not per message.

In this configuration, we assume that these errors are rare and if they happen once in a while, it is OK to retry again. If they happen frequently and continuously, it means that it requires immediate attention to get it back to normal state.

...

Error code

...

Description

...

101000

...

Receiver IO error sending

...

101001

...

Receiver IO error receiving

...

101500

...

Sender IO error sending

...

101501

...

Sender IO error receiving

...

101503

...

Connection failed

...

101504

...

Connection timed out

...

101505

...

Connection closed

...

101506

...

HTTP protocol violation

...

101507

...

Connect cancel

...

101508

...

Connect timeout

...

101509

...


    </failover>
</endpoint>

Here the Sample_First endpoint is marked as Timeout if a connection times out, closes, or sends IO errors. For all the other errors, it will be marked as Suspended. When this error occurs, the failover will retry using the first non suspended endpoint. In this case, it is the same endpoint (Sample_First). It will retry until the retry count becomes 0. The retry happens in parallel. Since messages come to this endpoint using many threads, the same message may not be retried three times. Another message may fail and can reduce the retry count. 

Info

The retry count is per endpoint, not per message.

In this configuration, we assume that these errors are rare and if they happen once in a while, it is OK to retry again. If they happen frequently and continuously, it means that it requires immediate attention to get it back to normal state.