Page Comparison

...

The last step of a message processing inside WSO2 Enterprise Service Bus is to send the message to a service provider (see also Mediating Messages) by sending the message to a listening service endpoint. During this process, transport errors can occur. For example, the connection might time out, or it might be closed by the actual service. Therefore, endpoint error handling is a key part of any successful ESB deploymentEnterprise Integrator deployment.

Messages can fail or be lost due to various reasons in a real TCP network. When an error occurs, if the ESB is Enterprise Integrator is not configured to accept the error, it will mark the endpoint as failed, which leads to a message failure. By default, the endpoint is marked as failed for quite a long time, and due to this error, subsequent messages can get lost.

...

For information on general error handling and error codes in the ESBEnterprise Integrator, see Error Handling.

Endpoint states

At any given time, the state of the endpoint can be one of the following:

State	Description
Active	Endpoint is running and handling requests.
Timeout	Endpoint encountered an error but can still send and receive messages. If it continues to encounter errors, it will be suspended.
Suspended	Endpoint encountered errors and cannot send or receive messages. Incoming messages to a suspended endpoint result in a fault.
OFF	Endpoint is not active. To put an endpoint into the OFF state, or to move it from OFF to Active, you must use JMX.

Active
Anchor
Active
Active

When WSO2 Enterprise Service Bus starts, endpoints are in the "Active" state and ready to handle messages. If the user does not put the endpoint into the OFF state, it will be in the "Active" state until an error occurs.

...

For example, let's assume the number of retries is set to 3. When an error occurs and the endpoint is set to the "Timeout" state, the ESB Enterprise Integrator can try to send up to three more messages to the endpoint. If the next three messages sent to this endpoint result in an error, the endpoint is put in the "Suspended" state. If one of the messages succeeds before the retry maximum is met, the endpoint will be marked as "Active."

...

A "Suspended" endpoint cannot send or receive messages. When an endpoint is put into this state, the ESB Enterprise Integrator waits until after an initial duration has elapsed (default is 30 seconds) before attempting to send messages to this endpoint again. If the message succeeds, the endpoint is marked as "Active." If the next message fails, the endpoint is marked as "Suspended" or "Timeout" depending on the error, and the ESB Enterprise Integrator waits before retrying messages using the following formula:

...

Code Block

<address uri="endpoint address" [format="soap11|soap12|pox|get"]
    [optimize="mtom|swa"] [encoding="charset encoding"]
    [statistics="enable|disable"] [trace="enable|disable"]>
	<enableRM [policy="key"]/>?
        <enableSec [policy="key"]/>?
        <enableAddressing [version="final|submission"] [separateListener="true|false"]/>?

        <timeout>
                <duration>timeout duration in seconds</duration>
                <responseAction>discard|fault|fault<never</responseAction>
        </timeout>?

        <markForSuspension>
                [<errorCodes>xxx,yyy</errorCodes>]
                <retriesBeforeSuspension>m</retriesBeforeSuspension>
                <retryDelay>d</retryDelay>
        </markForSuspension>

        <suspendOnFailure>
	        [<errorCodes>xxx,yyy</errorCodes>]
                <initialDuration>n</initialDuration>
                <progressionFactor>r</progressionFactor>
                <maximumDuration>l</maximumDuration>
        </suspendOnFailure>
</address>

"Timeout" settings
Anchor
timeoutSettings
timeoutSettings

Name	Values	Default	Description
duration	Miliseconds/ XPATH expression	60000	Connection timeout interval. If the remote endpoint does not respond in this time, it will be marked as "Timeout." This can be defined as a static value or as a dynamic value.
responseAction	discard, fault,

none

never

none

never

When a response comes to a timed out request, specifies whether to discard it or invoke the fault handler.

If none

If you select "never", the endpoint remains in the "Active" state.

"MarkForSuspension" settings
Anchor
markForSuspension
markForSuspension

Name	Values	Default	Description
errorCodes	Comma separated list of error codes	101504, 101505	Errors that put the endpoint into the "Timeout" state. If no error codes are specified, the "HTTP Connection Closed" and "HTTP Connection Timeout" errors are considered "Timeout" errors, and all other errors put the endpoint into the "Suspended" state.
retriesBeforeSuspension	Integer	0	In the "Timeout" state this number of requests minus one can be tried and fail before the endpoint is marked as "Suspended". This setting is per endpoint, not per message, so several messages can be tried in parallel and fail and the remaining retries for that endpoint will be reduced.
retryDelay	milliseconds	0	The time to wait between the last retry attempt and the next retry.

"suspendOnFailure" settings
Anchor
suspendOnFailure
suspendOnFailure

Name	Values	Default	Description
errorCodes	Comma separated list of error codes	All the errors except the errors specified in `markForSuspension`	Errors that send the endpoint into the "Suspended" state.
initialDuration	milliseconds	30000	After an endpoint gets "Suspended," it will wait for this amount of time before trying to send the messages coming to it. All the messages coming during this time period will result in fault sequence activation.
progressionFactor	Integer	1	The endpoint will try to send the messages after the `initialDuration`. If it still fails, the next duration is calculated as: `Min(current suspension duration * progressionFactor, maximumDuration)`
maximumDuration	milliseconds	Long.MAX_VALUE	Upper bound of retry duration.

Sample Configuration:

Code Block

<endpoint name="Sample_First" statistics="enable" >
    <address uri="http://localhost/myendpoint" statistics="enable" trace="disable">
        <timeout>
            <duration>60000</duration>
        </timeout>

        <markForSuspension>
            <errorCodes>101504, 101505</errorCodes>
            <retriesBeforeSuspension>3</retriesBeforeSuspension>
            <retryDelay>1</retryDelay>
        </markForSuspension>

        <suspendOnFailure>
            <errorCodes>101500, 101501, 101506, 101507, 101508</errorCodes>
            <initialDuration>1000</initialDuration>
            <progressionFactor>2</progressionFactor>
            <maximumDuration>60000</maximumDuration>
        </suspendOnFailure>

    </address>
</endpoint>

...

Configuring retry

You can configure the ESB Enterprise Integrator to enable or disable retry for an endpoint when a specific error code occurs. For example:

...

In this example, if the error code 101503 occurs when trying to connect to the first endpoint, the endpoint is not retried, whereas in the second endpoint, the endpoint is always retried if error code 101503 occurs. You can specify enabled or disabled error codes (but not both) for a given endpoint.

Configuring

...

a failover

...

endpoint
Anchor
Failover Endpoint Configurations
Failover Endpoint Configurations

With leaf endpoints, if an error occurs during a message transmission process, that message will be lost. The failed message will not be retried again. These errors occur very rarely, but still message failures can occur. With some applications these message losses are acceptable, but if even rare message failures are not acceptable, use the failover endpoint.

Here is the configuration for failover endpoints. At the configuration level, a failover is a logical grouping of one or more leaf endpoints.

Code Block
<failover> <endpoint .../>+ </failover>

When a message comes to the Failover state, it will go through its list of endpoints to pick the first one in Active or Timeout state. Then it will send the message using that particular endpoint. If an error occurs while sending the message, the failover will go through the endpoint list again from the beginning and will try to send the message using the first endpoint.

Some errors put the endpoint into Timeout and some keep the endpoint in the Active state. In these cases, the retry can happen using the same endpoint. If the failure occurs with the first endpoint within the failover group and this error does not put the endpoint into Suspended state, the retry will happen using the same endpoint.

Failover gives priority to the first endpoint that is not in the Suspended state. So it will send the message through the first endpoint in the failover group, as long as it is not suspended. When the first endpoint is suspended, it will send the requests using the second endpoint. When the first endpoint becomes ready to send again, it will try again on the first endpoint, even though the second endpoint is still active.

If there is only one service endpoint and the message failure is not tolerable, failovers are possible with a single endpoint.

A sample failover with one address endpoint:

Code Block

<endpoint name="SampleFailover">
    <failover>
        <endpoint name="Sample_First" statistics="enable" >
            <address uri="http://localhost/myendpoint" statistics="enable" trace="disable">
                <timeout>
                    <duration>60000</duration>
                </timeout>

                <markForSuspension>
                    <errorCodes>101504, 101505, 101500</errorCodes>
                    <retriesBeforeSuspension>3</retriesBeforeSuspension>
                    <retryDelay>1</retryDelay>
                </markForSuspension>

                <suspendOnFailure>
                    <initialDuration>1000</initialDuration>
                    <progressionFactor>2</progressionFactor>
                    <maximumDuration>64000</maximumDuration>
                </suspendOnFailure>

            </address>
        </endpoint>
    </failover>
</endpoint>

Here the Sample_First endpoint is marked as Timeout if a connection times out, closes, or sends IO errors. For all the other errors, it will be marked as Suspended. When this error occurs, the failover will retry using the first non suspended endpoint. In this case, it is the same endpoint (Sample_First). It will retry until the retry count becomes 0. The retry happens in parallel. Since messages come to this endpoint using many threads, the same message may not be retried three times. Another message may fail and can reduce the retry count.

Info
The retry count is per endpoint, not per message.

In this configuration, we assume that these errors are rare and if they happen once in a while, it is OK to retry again. If they happen frequently and continuously, it means that it requires immediate attention to get it back to normal stateFor information on configuring a failover endpoint to handle errors, see Configuring Failover Endpoints.

Versions Compared

Old Version 1

New Version Current

Key

Endpoint states

Active
Anchor
Active
Active

"Timeout" settings
Anchor
timeoutSettings
timeoutSettings

"MarkForSuspension" settings
Anchor
markForSuspension
markForSuspension

"suspendOnFailure" settings
Anchor
suspendOnFailure
suspendOnFailure

Configuring retry

Configuring

a failover

endpoint
Anchor
Failover Endpoint Configurations
Failover Endpoint Configurations

Page Comparison

Versions Compared

Old Version 1

New Version Current

Key

Endpoint states

Active AnchorActiveActive

"Timeout" settings AnchortimeoutSettingstimeoutSettings

"MarkForSuspension" settings AnchormarkForSuspensionmarkForSuspension

"suspendOnFailure" settings AnchorsuspendOnFailuresuspendOnFailure

Configuring retry

Configuring

a failover

endpoint AnchorFailover Endpoint ConfigurationsFailover Endpoint Configurations

Active
Anchor
Active
Active

"Timeout" settings
Anchor
timeoutSettings
timeoutSettings

"MarkForSuspension" settings
Anchor
markForSuspension
markForSuspension

"suspendOnFailure" settings
Anchor
suspendOnFailure
suspendOnFailure

endpoint
Anchor
Failover Endpoint Configurations
Failover Endpoint Configurations