Minimum High Availability Deployment

When streaming events with WSO2 Stream Processor (SP), there is a possibility of the node running the SP instance failing due to several unforeseeable reasons. This leads to a loss in the events being streamed until the node can be restarted. A solution for this is to use the WSO2 SP in a High Availability (HA) environment where the processing of events is not halted at an unexpected failover scenario. The recommended HA deployment for WSO2 SP is the two node minimum HA deployment, where two instances of WSO2 SP are running in parallel as depicted in the diagram below.

In this minimum HA setup, one node is assigned as the active node while the other node is assigned as the passive node. Only the active node processes the incoming events and publishes the outgoing events. Internally, the active node publishes the events to the passive node, but the passive node does not process or send any events outside as mentioned earlier. In a scenario where the active node fails, the passive node is activated, and it starts receiving events and then publishes them from where the active node left off. Once the terminated (previously active) node restarts , it operates in the passive state. In the passive node, the databridge port and other ports related to sources remain closed, and events are not fetched from pulling sources such as Kafka, JMS, etc., until the node becomes active.

The ports that are open only in the active node at a given time include the Siddhi Store Query API endpoint to which requests are sent by invoking the Siddhi Store REST API. These ports are configured in the <SP_HOME>/conf/worker/deployment.yaml file. For more information about this port configuration, see Managing Stored Data via REST APIs.

When a failover occurs, the Siddhi Store Query API endpoint configured in node 2 (which becomes the currently active node) is opened, and all the store query traffic is directed to that endpoint.

For a two-node minimum HA cluster to work, only the active node should receive events. By design, you can only send events to active node. To achieve this, you can use a load balancing mechanism that sends events in failover manner.

Prerequisites

In order to configure a minimum HA cluster, the following prerequisites must be completed:

To run this set up, you need a CPU with four cores, and a total memory of 4GB.
Two binary packs of WSO2 SP must be available.
A working RDBMS instance to be used for clustering of the 2 nodes.
Download the MySQL connector from here. Extract and find the mysql-connector-java-5.*.*-bin.jar. Place this JAR in the <SP_HOME>/lib directory of both nodes.
In order to retrieve the state of the Siddhi Applications deployed in the system in case of a scenario where both the nodes fail, state persistence must be enabled for both nodes by specifying the same datasource/file location. For detailed instructions, see Configuring Database and File System State Persistence.
A load balancer or some other client-side data publishing mechanism that works in a failover manner must be available to publish events to one of the available nodes (i.e., to the active node).

Configuring a minimum HA cluster

To configure a minimum HA cluster, follow the steps below:

Note that the following configurations need to be done in the <SP_HOME>/conf/worker/deployment.yaml file for both the WSO2 SP nodes in the cluster.
If you need to run both SP instances in the same host, make sure that you do a port offset to change the default ports in one of the hosts. For more information about the default ports, see Configuring Default Ports.

For each node, enter a unique ID for the id property under the wso2.carbon section. (e.g., id: wso2-sp). This is used to identify each node within a cluster.

To allow the two nodes to use same persistence storage, you need to configure persistence configuration under the state.persistence. State persistence can be configured to use db-based or file-based. However, the persistence storage needs to be shared between nodes irrespective of the method. The following is a configuration for db-based file persistence.

state.persistence:
  enabled: true
  intervalInMin: 1
  revisionsToKeep: 2
  persistenceStore: org.wso2.carbon.stream.processor.core.persistence.DBPersistenceStore
  config:
    datasource: PERSISTENCE_DB   # A datasource with this name should be defined in wso2.datasources namespace
    table: PERSISTENCE_TABLE

The datasource named PERSISTENCE_DB in the above configuration can be defined in the <SP_HOME>/conf/worker/deployment.yaml file under the wso2.datasources section. Following is a sample datasource configuration.

- name: PERSISTENCE_DB
      description: The MySQL datasource used for persistence
      jndiConfig:
        name: jdbc/PERSISTENCE_DB
      definition:
        type: RDBMS
        configuration:
          jdbcUrl: 'jdbc:mysql://localhost:3306/PERSISTENCE_DB?useSSL=false'
          username: root
          password: root
          driverClassName: com.mysql.jdbc.Driver
          maxPoolSize: 50
          idleTimeout: 60000
          connectionTestQuery: SELECT 1
          validationTimeout: 30000
          isAutoCommit: false

To allow the two nodes in the cluster to coordinate effectively, configure carbon coordination by updating the cluster.config section of the <SP_HOME>/conf/worker/deployment.yaml as follows:
For more information on how to set up cluster coordination refer Configuring Cluster Coordination
1. To enable the cluster mode, set the enabled property to true.
2. In order to cluster the two nodes together, enter the same ID as the group ID for both nodes (e.g., groupId: group-1).
3. Enter the ID of the class that defines the coordination strategy for the cluster as shown in the example below.
  e.g., coordinationStrategyClass: org.wso2.carbon.cluster.coordinator.rdbms.RDBMSCoordinationStrategy
4. In the strategyConfig section, enter information as follows:
  1. For clustering of the two nodes to take place
  2. Enter the name of the configured datasource shared by the nodes in the cluster as shown in the example below. Data handled by the cluster are persisted here.
    datasource: WSO2_CLUSTER_DB (A datasource with this name should be configured)
    Following is a sample datasource configuration for a MySQL datasource that should appear under the dataSources section of the wso2.datasources section in the <SP_HOME>/conf/worker/deployment.yaml . For detailed instructions of how to configure a datasource, see Configuring Datasources.
    Sample MySQL datasource
    - name: WSO2_CLUSTER_DB description: The MySQL datasource used for Cluster Coordination jndiConfig: name: jdbc/WSO2ClusterDB definition: type: RDBMS configuration: jdbcUrl: 'jdbc:mysql://localhost:3306/WSO2_CLUSTER_DB?useSSL=false' username: root password: root driverClassName: com.mysql.jdbc.Driver maxPoolSize: 50 idleTimeout: 60000 connectionTestQuery: SELECT 1 validationTimeout: 30000 isAutoCommit: false
  3. Define the time interval (in milliseconds) at which heartbeat pulse should occur for each node to indicate that it is in an active state as shown in the example below.
    heartbeatInterval: 1000
  4. Define the number of times the heartbeat pulse should be unavailable at the specified time interval in order to consider a node inactive as shown in the example below. A value of two means that if a node fails to send two consecutive heart beat pulses, it must be identified as inactive and removed from the cluster as a result.
    heartbeatMaxRetry: 2
  5. Define the time interval (in milliseconds) at which each node should listen for changes that occur in the cluster as shown in the example below.
    eventPollingInterval: 1000
Next add the deployment.config section to the <SP_HOME>/conf/worker/deployment.yaml file with following configurations:
1. To enable 2 node minimum HA, set the type property to ha as shown below.
  type: ha
2. To configure the TCP server via which event synchronization is carried out, add a subsection named eventSyncServer and enter information as follows:
  1. Set the host and port to enable direct communication between the two nodes via TCP calls as shown in the following example. This ensures that the communication between the nodes is instantaneous.
    host: localhost port: 9893
  2. To define the address to which the TCP requests by the active node must be directed, enter the relevant host and port in the advertisedHost and advertisedPort parameters respectively as shown in the example below. These values are used to map the advertised host and port with the actual host and port defined.
    
    These are optional parameters that you need to configure only in scenarios where the host and/or port is different from the advertised host and/or port (e.g., container based scenarios).
    advertisedHost: localhost advertisedPort: 9893
  3. Define a number of boss threads for the TCP server to handle the connections as shown in the example below. 10 is specified by default.
    bossThreads: 10
  4. Define a number of worker threads for the TCP server to handle the connections as shown in the example below. 10 is specified by default.
    workerThreads: 10
3. To configure the TCP client via which requests are sent to the SP cluster, add a subsection named eventSyncClientPool and add information as follows:
  1. Define the maximum number of active connections that must be allowed in the TCP client pool as shown in the example below. 10 is specified by default.
    maxActive : 10
  2. Define the maximum number of total connections that must be allowed in the TCP client pool as shown in the example below. 10 is specified by default.
    maxTotal : 10
  3. Define the maximum number of idle connections that must be allowed in the TCP client pool as shown in the example below. 10 is specified by default.
    maxIdle : 10
  4. Define the number of milliseconds the client pool must wait for an idle object when the connection pool is exhausted . 60000 is specified by default.
    maxWait: 60000
  5. Define the minimum number of milliseconds an object can sit idle in the pool before it is eligible for eviction. The following is an example. 120000 is specified by default.
    minEvictableIdleTimeInMillis : 120000
4. Define the capacity for the buffer queue as shown in the example below. This queue ensures that events are kept in the passive node and the state is synced correctly if a failover takes place. 20000 is specified by default.
```
minEvictableIdleTimeInMillis : 120000
```
5. Define the number of threads in the thread pool that extracts bytes from the event byte buffer queue. The following is an example. 5 is specified by default.
```
byetBufferExtractorThreadPoolSize: 5
```
The following extract is an example of the deployment configuration that is defined by following the above steps.
```
deployment.config:
  type: ha
  eventByteBufferQueueCapacity: 20000
  byteBufferExtractorThreadPoolSize: 5
  eventSyncServer:
	host: localhost
	port: 9893
    advertisedHost:	localhost
    advertisedPort:	9893
	bossThreads: 10
	workerThreads: 10	
  eventSyncClientPool:
	maxActive: 10
	maxTotal: 10
	maxIdle: 10
	maxWait: 60000
	minEvictableIdleTimeMillis: 120000
```

Publishing events to the cluster

The following diagram illustrates how events can be published to a two-node minimum HA cluster that uses sources such as HTTP, JMS queue, JMS topic etc.

The load balancer directs events only to the active SP node in the minimum HA cluster. The active node processes and publishes these events. The events are not published to the passive node. The events are buffered in the passive node so that the cluster can resume processing and publishing events via this node if the currently active node fails.

Starting the cluster

Save the required Siddhi applications in the <SP_HOME>/deployment/siddhi-files directory in both nodes. In order to ensure that the Siddhi applications are completely synchronized between the active and the passive node, they must be added to the siddhi-files directory before the server startup. However, the synchronization can take place effectively even if the Siddhi applications are added while the server is running.

In deploying Siddhi applications in a two node minimum HA cluster, it is recommended to use a content synchronization mechanism since Siddhi applications must be deployed to both worker nodes. You can use a common shared file system such as Network File System (NFS) or any other shared file system that is available. You need to mount the <SP_HOME>/deployment/siddhi-files directory of the two nodes to the shared file system.
Start both servers by navigating to <SP_HOME>/bin and issuing the following command:
For Windows: worker.bat For Linux : ./worker.sh

To start two WSO2 SP Nodes in the same machine, the listenerConfigurations under the wso2.transport.http namespace in the <SP_HOME>/conf/worker/deployment.yaml file must be updated to listen to different ports. The offset property under the ports section of the wso2.carbon section found in the <SP_HOME>/conf/worker/deployment.yaml should also be changed in one SP instance to avoid conflicts when starting both servers.
If the cluster is correctly configured, the following CLI logs can be viewed without any error logs:
In the active node:
```
[2018-09-09 23:56:54,272]  INFO {org.wso2.carbon.stream.processor.core.internal.ServiceComponent} - WSO2 Stream Processor Starting in Two Node Minimum HA Deployment
[2018-09-09 23:56:54,294]  INFO {org.wso2.carbon.stream.processor.core.ha.HAManager} - HA Deployment: Starting up as Active Node
```
In the passive node:
```
[2018-09-09 23:58:44,178]  INFO {org.wso2.carbon.stream.processor.core.internal.ServiceComponent} - WSO2 Stream Processor Starting in Two Node Minimum HA Deployment
[2018-09-09 23:58:44,199]  INFO {org.wso2.carbon.stream.processor.core.ha.HAManager} - HA Deployment: Starting up as Passive Node
```