Automatic Failover

WSO2 ELB lies between the client and back-end services. When a request comes to the ELB from a client, it examines the request headers and routes the message to the appropriate service cluster, without processing the message body.

To achieve high availability in your system, you can front a single or multiple service clusters with WSO2 ELB. However, an ELB instance is a single point of failure. To avoid the risk of system downtime due to failure of the ELB, you can implement a fail-proof ELB deployment. This section explains how to implement failover using two identical ELB setups running in active and passive modes respectively.

High-Availability, Fail-Proof ELB Deployment

The following diagram depicts:

Two identical ELB setups running in active and passive modes, each fronting a single or several service clusters.
The open source Keepalived (http://www.keepalived.org) package installed on both hosts running the ELB instances where both hosts are fronted with a virtual IP address.
A health-check service checking the availability of each ELB.

Figure: Fail-proof ELB deployment

Follow the instructions below to implement this sample setup.

Note

The setup instructions below are common to all Linux-based operating systems.

Setting up the ELB

1. First, set up WSO2 ELB in a clustered environment according to the deployment pattern 3, which can be found in Worker/Manager Clustering Patterns.

2. Select a method to check the availability of the ELB in your set up. Given below are few options.

Check the pid of the running java process.
Send a request to a hosted service through the ELB.
Connect to an open socket exposed by the ELB.
Host a proxy service inside the ELB and ping the service.

While you can use any method familiar to you, in this guide we demonstrate the last one; i.e. hosting a proxy service inside the ELB, and pinging the service to check whether the ELB is functioning well. The main reasons for picking the last option are:

At the even of an ELB malfunction, the process could still be running or frozen.
Sending a request to the back-end service could fail due to a problem in the service rather than the ELB.
You might be able to still connect to a socket even though the ELB isn’t functioning properly.

3. Next, host a proxy service inside the ELB by creating an XML file (e.g. EchoProxy.xml) with the following sample content in <ELB_HOME>/repository/deployment/server/synapse-configs/default/proxy-services/ directory for example.

<?xml version="1.0" encoding="UTF-8"?>
<proxy xmlns="http://ws.apache.org/ns/synapse" name="EchoProxy" transports="https http" startOnLoad="true" trace="disable">
    <target>
       <inSequence>
          <header name="To" action="remove"/>
          <property name="NO_ENTITY_BODY" scope="axis2" action="remove"/>
          <property name="RESPONSE" value="true" scope="default" type="STRING"/>
          <send/>
       </inSequence>
    </target>
</proxy>

4. Start the ELB, send a request to the service using a REST client like cURL. Shown below is a sample command you can use to ping your hosted service, along wit the output.

~$ curl -v http://localhost:8280/services/EchoProxy
* About to connect() to localhost port 8280 (#0)
* Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 8280 (#0)
> GET /services/EchoProxy HTTP/1.1
> User-Agent: curl/7.21.0 (x86_64-pc-linux-gnu) libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.15 libssh2/1.2.6
> Host: localhost:8280
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/x-www-form-urlencoded
< Accept: */*
< Host: localhost:8280
< Date: Mon, 24 Sep 2012 16:10:43 GMT
< Server: Synapse-HttpComponents-NIO
< Transfer-Encoding: chunked
< 
* Connection #0 to host localhost left intact
* Closing connection #0

5. See whether you get a HTTP/1.1 200 OK section in the response. If so, then the ping is successful and the ELB is functioning properly.

Infor

For a list of HTTP status codes, refer to http://en.wikipedia.org/wiki/List_of_HTTP_status_codes#2xx_Success.

6. For convenience, we recommend you to have this process in a script as follows, on both active and passive hosts. You can use this script when configuring keepalived. In this example, we place the script in /opt/bin folder.

#!/bin/bash
CODE=`wget --server-response http://localhost:8280/services/EchoProxy 2>&1 | awk '/^  HTTP/{print $2}'`;
[ ${CODE} == "200" ] && exit 0 || exit 1

The above script returns 0 when the HTTP response code is 200 or else it returns 1.

You are now ready to set up keepalived.

Setting up keepalived

7. Install keepalived in your system.

8. Once keepalived is installed, create a keepalived.conf in /etc/keepalived folder with following contents.

When configuring, we maintain a virtual IP address assigned to the active ELB. This IP will be switched to the passive ELB when the active host fails, which means the health-check script at the active host returns 1.

global_defs {
     notification_email {
          admin@domain.com
          user@domain.com
     }
     notification_email_from automatic-failover@domain.com
     smtp_server xx.xx.xx.xx
     smtp_connect_timeout 30
     router_id VRRP-director1
}
vrrp_script check_elb {
     script "/opt/bin/check_elb.sh"
     interval 2
     weight 2
}
vrrp_instance VRRP-director1 {
     virtual_router_id 51
     advert_int 1
     priority 101
     interface eth0
     state MASTER
     smtp_alert
     track_script {
          check_elb
     }
     virtual_ipaddress {
          192.168.4.250/24
     }
}

Note the following regarding the configuration above:

We have defined SMTP details at global_defs level. This way you can get alerts when the state of the host changes from MASTER to BACKUP.
Inside the vrrp_script block, you have the health check script, the interval it should run, and the weight for load balancing decisions.
Inside the vrrp_instance block you have the following:
- Virtual rounter redundancy protocol (VRRP) id the instance belongs to.
- Details advertising interval from this router instance to other keepalived nodes.
- Instance priority of the VRRP router.
- Network interface for the instance to run on.
- Instance state on whether this is a MASTER or a BACKUP host.
- Activate SMTP alerts.
- The track script and the virtual IP address for the router instance.

9. Set up keepalived on the other node in the same manner. You only have to change line numbers 19 (priority) and 21 (state) in the configuration given in previous step. Ensure that you set a lesser value as the priority since it’s going to be the BACKUP host. For example, the vrrp_instance looks like this.

vrrp_instance VRRP-director1 {
     virtual_router_id 51
     advert_int 1
     priority 100
     interface eth0
     state BACKUP
     smtp_alert
     track_script {
          check_elb
     }
     virtual_ipaddress {
          192.168.4.250/24
     }
}

10. Once all configurations are done, start keepalived as any other Linux-based service (/etc/init.d/keepalived start). You will see that there are 2 IP addresses bound to the eth0 interface as follows.

~# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
 link/ether 00:50:56:b7:2f:fc brd ff:ff:ff:ff:ff:ff
 inet 192.168.4.102/24 brd 192.168.7.255 scope global eth0
 inet 192.168.4.250/24 scope global secondary eth0
 inet6 fe80::250:56ff:feb7:2ffc/64 scope link 
 valid_lft forever preferred_lft forever

One IP address is bound to the physical network interface and the other one is the virtual IP address mentioned in keepalived.conf.

11. Stop the ELB in the MASTER host and note the following.

The health check script will fail and keepalived will notice it to the BACKUP host.
The virtual IP address gets assigned to the BACKUP host.

When the MASTER host comes back online, the IP will be assigned to the MASTER host again.

You can extend this implementation to other features as well, such as monitoring and alerting if there’s something wrong in the system.