Configuring HA Using Pacemaker and Heartbeat

The following sections cover the information required to configure high availability (HA) for PPaaS using Pacemaker/Heartbeat:

What is Pacemaker?

Pacemaker is a cluster resource manager (CRM). It achieves maximum availability for your cluster services (resources) by detecting and recovering node and resource-level failures using the messaging and membership capabilities provided by your preferred cluster infrastructure (Corosync or Heartbeat).

Refer Pacemaker documentation for more information.

What is Heartbeat?

Heartbeat is a daemon that provides cluster infrastructure (communication and membership) services to its clients. This allows clients to know about the presence (or disappearance!) of peer processes on other machines and to easily exchange messages with them.

In order to be useful to users, the Heartbeat daemon needs to be combined with a CRM, which has the task of starting and stopping the services (IP addresses, web servers, etc.) making clusters highly available. Pacemaker is the preferred cluster resource manager for clusters based on Heartbeat.

Prerequisites

Two physical or virtual hosts running Ubuntu 12.04 64 bit OS.
Pacemaker 1.1.6
Heartbeat 3.0.5

Configuring Pacemaker/Heartbeat for PPaaS

SSH into the above VM instance and install Pacemaker and Heartbeat:
```
apt-get install pacemaker heartbeat
```
Switch to root user:
```
sudo su
```

Create the Heartbeat configuration file at the following location: /etc/ha.d/ha.cf

enable pacemaker, without stonith
crm             yes
# define log file
logfile /var/log/ha-log
# warning of soon be dead
warntime        10
# declare a host (the other node) dead after:
deadtime        20
# dead time on boot (could take some time until net is up)
initdead        120
# time between heartbeats
keepalive       2
# the nodes
node node1 # set node1 hostname
node node2 # set node2 hostname
# heartbeats, over dedicated replication interface
ucast           eth1 10.186.175.16 # set node1 network-interface and ip address
ucast           eth1 54.211.110.217 # set node2 network-interface and ip address

Create the authentication key file and set permissions in one of the hosts:

( echo -ne "auth 1\n1 sha1 "; \
dd if=/dev/urandom bs=512 count=1 | openssl md5 ) \
> /etc/ha.d/authkeys

chmod 0600 /etc/ha.d/authkeys

Copy the above authkeys file to each host located at /etc/ha.d/authkeys.
Restart Hearbeat service:
```
service heartbeat restart
```

Check the status of the Pacemaker cluster using CRM:

All nodes in the cluster should be in the online state. Recheck the heartbeat configuration if a cluster is in the offline state.

crm status
============
Last updated: Wed Oct 15 11:25:05 2014
Last change: Wed Oct 15 11:21:51 2014 via crmd on ip-10-186-175-16
Stack: Heartbeat
Current DC: ip-10-186-175-16 (d16ccc5c-2641-42b6-b46a-57a0b32fddc9) - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, unknown expected votes
0 Resources configured.
============
Online: [ ip-10-186-175-16 ip-10-153-165-178 ]

Disable STONITH:

crm configure property stonith-enabled=false

Create a Failover IP resource to manage the virtual IP address:

crm configure primitive FAILOVER-IP ocf:heartbeat:IPaddr params ip=192.168.10.20 cidr_netmask="255.255.255.0" op monitor interval=10s

Secure copy (SCP) java and PPaaS packages to each host and extract them under folder /opt .

Create an init.d script for PPaaS with the following:
Update the values of USER, JAVA_HOME and PRODUCT_HOME variables.

#!/bin/sh
### BEGIN INIT INFO
# Provides:          ppaas
# Required-Start:    $local_fs $remote_fs $network $syslog $named
# Required-Stop:     $local_fs $remote_fs $network $syslog $named
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# X-Interactive:     true
# Short-Description: Start/stop ppaas server
### END INIT INFO

USER="vagrant"
PRODUCT_NAME="ppaas"
JAVA_HOME="/opt/jdk1.7.0_60"
PRODUCT_HOME="/opt/ppaas_4.1.0"
PID_FILE="${PRODUCT_HOME}/wso2carbon.pid"
CMD="${PRODUCT_HOME}/bin/wso2server.sh"

# LSB exit codes:
# ftp://ftp.nomadlinux.com/nomad-2/dist/heartbeat-1.2.5/include/clplumbing/lsb_exitcodes.h

LSB_EXIT_OK=0
LSB_EXIT_GENERIC=1
LSB_EXIT_EINVAL=2
LSB_EXIT_ENOTSUPPORTED=3
LSB_EXIT_EPERM=4
LSB_EXIT_NOTINSTALLED=5
LSB_EXIT_NOTCONFIGED=6
LSB_EXIT_NOTRUNNING=7

is_service_running() {
	if [ -e ${PID_FILE} ]; then
		PID=`cat ${PID_FILE}`
	    if  ps -p $PID >&- ; then
			# service is running
			return 0
		else
			# service is stopped
			return 1
	    fi
	else
		# pid file was not found, may be server was not started before
		return 1
	fi
}

# Status the service
status() {
	is_service_running
	service_status=$?
		
	if [ "${service_status}" -eq 0 ]; then
		echo "${PRODUCT_NAME} service is running"
		return ${LSB_EXIT_OK}
	elif [ "${service_status}" -eq 1 ]; then
		echo "$PRODUCT_NAME service is stopped"
		return ${LSB_EXIT_OK}
	else 
		echo "$PRODUCT_NAME service status is unknown"
		return ${LSB_EXIT_GENERIC}
    fi
}

# Start the service
start() {
	if is_service_running; then
		echo "${PRODUCT_NAME} service is already running"
		return ${LSB_EXIT_OK}
	fi
	
	echo "starting ${PRODUCT_NAME} service..."  
	su - ${USER} -c "export JAVA_HOME=${JAVA_HOME}; ${CMD} start"
	
	is_service_running
	service_status=$?
	while [ "$service_status" -ne "0" ]
	do
		sleep 1;
		is_service_running
		service_status=$?
	done
	
	echo "${PRODUCT_NAME} service started"
	return ${LSB_EXIT_OK}
}

# Restart the service
restart() {
	echo "restarting ${PRODUCT_NAME} service..."
	su - ${USER} -c "export JAVA_HOME=${JAVA_HOME}; ${CMD} restart"
    echo "${PRODUCT_NAME} service restarted"
	return ${LSB_EXIT_OK}
}

# Stop the service
stop() {
	if ! is_service_running; then
		echo "${PRODUCT_NAME} service is already stopped"
		return ${LSB_EXIT_OK}
	fi
	
	echo "stopping ${PRODUCT_NAME} service..."
	su - ${USER} -c "export JAVA_HOME=${JAVA_HOME}; ${CMD} stop"
	
	is_service_running
	service_status=$?
	while [ "$service_status" -eq "0" ]
	do
		sleep 1;
		is_service_running
		service_status=$?
	done
	
	echo "${PRODUCT_NAME} service stopped"
	return ${LSB_EXIT_OK}
}
### main logic ###
case "$1" in
start)
    start
    ;;
stop|graceful-stop)
    stop
    ;;
status)
    status
    ;;
restart|reload|force-reload)
    restart
    ;;
*)
   echo $"usage: $0 {start|stop|graceful-stop|restart|reload|force-reload|status}"
   exit 1
esac
exit $?

Create a CRM resource for PPaaS:

crm configure primitive PPAAS lsb::ppaas op monitor interval=15s

Create a CRM resource group and add FAILOVER-IP and PPAAS resources:

crm configure group FAILOVER-IP-RESOURCE-GROUP FAILOVER-IP PPAAS

Configure a colocation dependency between FAILOVER-IP and PPAAS.

This will ensure both FAILOVER-IP and PPAAS resources staying in the same host.
```
 crm configure colocation FAILOVER-IP-RESOURCE-GROUP-COLOCATION inf: FAILOVER-IP PPAAS
```

Deleting a resource

Use the following to delete a resource:

crm_resource -D -r my_first_ip -t primitive

Deleting a resource group

Use the following to delete a resource group:
```
crm_resource -D -r my_first_group -t group
```

References

For more information on configuring HA using Pacemaker/Heartbeat refer the following:

[1] https://www.zivtech.com/blog/setting-ip-failover-heartbeat-and-pacemaker-ubuntu-lucid

[2] http://www.linux-ha.org/doc/users-guide/_creating_an_initial_heartbeat_configuration.html

[3] http://foaa.de/old-blog/2010/10/intro-to-pacemaker-on-heartbeat/trackback/index.html

[4] http://code.naishe.in/2012/11/high-availability-ngnix-using-heartbeat.htm

[5] http://opentodo.net/2012/04/configuring-a-failover-cluster-with-heartbeat-pacemaker/

[6] http://doc.opensuse.org/products/draft/SLE-HA/SLE-ha-guide_sd_draft/man.crmresource.html