Clustering the Analytics Profile

You can deploy the Analytics profile of WSO2 Enterprise Integrator (WSO2 EI) to facilitate high availability (HA). This will prevent unexpected failover scenarios. The recommended HA deployment uses two instances of WSO2 EI as worker nodes and one WSO2 EI instance for the dashboard. The two worker nodes will publish the statistics to the EI_Analytics database, and the dashboard node will read the data from this database and display the statistics on the dashboard.

The recommended deployment for EI Analytics is the Active-Active deployment pattern. The active-active pattern is a highly scalable deployment pattern. For an overview of the Active-Active deployment pattern and instructions to configure it, see the following topics.

Overview

The above diagram represents a deployment where you are not limited to two nodes. You can scale the analytics setup by adding more EI Analytics nodes to the deployment. In this deployment, it is recommended to configure the EI node to publish events to multiple EI Analytics nodes in a Round Robin manner to ensure better fault tolerance. And the same event should NOT be duplicated to multiple analytics nodes.

Active-Active deployment pattern utilizes distributed aggregations in order to perform analytics in a scalable manner. Distributed aggregations allow multiple nodes to write data to the same aggregation parallelly. This allows you to deploy any number of nodes to process a single aggregation and thereby avoid performance bottlenecks. In this setup, all EI analytics nodes must share the same EI_ANALYTICS DB.

To understand how an active-active cluster processes aggregations when aggregations are partitioned and assigned to different nodes, consider the following Siddhi query which defines an aggregation called EIStatsAgg and let's assume this aggregation is processed in a distributed manner.

define stream PreProcessedESBStatStream(componentType string, componentID string, requestTimestamp long);

@store(type = 'rdbms', datasource = 'EI_ANALYTICS') define aggregation EIStatsAgg
from PreProcessedESBStatStream
select componentType, componentID, count() as totalRequestCount
group by componentType, componentID
aggregate by requestTimestamp every seconds...years;

The above query addresses a simple use case which involves calculating the total request count for different types of ESB components (i.e., different proxies, REST APIs, sequences, mediators, etc.). Each request received by EI will publish an event which contains various information about pertaining to the request, and the information captured includes the component ID, the component type, and the timestamp to which the information applies. When an analytics node receives such an event, it will be passed into the aggregation, perform the required calculations and stores this information in the EI_ANALYTICS data store defined in the <EI_HOME>/wso2/analytics/conf/worker/deployment.yaml file.

Now let's assume that during a specific hour, the EI node publishes 30,000 events to analytics-node-1, and 40,000 events to analytics-node-2 for a proxy named JMSProxy. When you retrieve the total request count during that hour for the JMSProxy proxy via a retrieval query, the result is 70,000.

The steps to enable aggregation partitioning are provided under Configuring an active-active cluster.

Setting up the databases

To create the databases, do the following:

These instructions assume you are installing MySQL as your relational database management system (RDBMS), but you can install another supported RDBMS as needed .

It is recommended to use an industry-standard RDBMS such as Oracle, PostgreSQL, MySQL, MS SQL, etc. for most enterprise testing and production environments.

Download and install MySQL Server.
Download the MySQL JDBC driver.
Unzip the downloaded MySQL driver zipped archive, and copy the MySQL JDBC driver JAR (mysql-connector-java-x.x.xx-bin.jar) to the <EI_HOME>/wso2/analytics/lib directory of all the WSO2 EI nodes.

As depicted in the above diagram, the following databases should be created for the Analytics cluster:

Database Name	Description
`WSO2_DASHBOARD_DB`	This database is required by the dashboard to store various data related to the dashboard.
`WSO2_PERMISSIONS_DB`	This database is required by the dashboard to store permissions that are granted to the users of the Analytics dashboard.
`EI_ANALYTICS`	This database persists the mediation statistics that are published to the worker nodes in the cluster. The dashboard node will access the statistics from this database.
`WSO2_CLUSTER_DB`	This database stores the cluster coordination data from the two worker nodes in the cluster.
`PERSISTENCE_DB`	This database periodically persists information about the status of the worker nodes. Create a table named `PERSISTENCE_TABLE`.

Configuring the Analytics dashboard

The Analytics dashboard is stored as a sub profile of the Analytics profile in WSO2 EI. You can configure a separate WSO2 EI node as the dashboard for both the workers nodes. You simply need to connect the dashboard to the required databases. The dashboard will read the statistics (published from the worker nodes) from in the EI_Analytics database and display the statistics on the dashboard.

Open the deployment.yaml file (stored in the <EI_HOME>/wso2/analytics/conf/dashboard/ directory) and update the datasource configurations (under the wso2.datasources section) for the WSO2_DASHBOARD_DB, EI_ANALYTICS, and WSO2_PERMISSIONS_DB databases.

Given below are sample configurations for the MySQL databases. If a datasource configuration already exists with the same name, be sure to replace it with the following samples.

Configuring the Analytics worker nodes

The following configurations need to be done in the deployment.yaml file stored in the <EI_HOME>/wso2/analytics/conf/worker/ directory for both the worker nodes.

Configuring the worker node IDs

For each node, enter a unique ID for the id property under the wso2.carbon section. (e.g., id: wso2-sp). This is used to identify each node within a cluster.

edit the <EI_HOME>/wso2/analytics/conf/worker/deployment.yaml file as follows:

For example, you can add IDs as shown below.

For node 1:
```
wso2.carbon:
  id: wso2-ei-analytics-1
```
For node 2:
```
wso2.carbon:
  id: wso2-ei-analytics-2
```

Enable partitioning aggregations for each node, and assign a unique shard ID for each node. To do this, set the partitionById and shardId parameters as Siddhi properties as shown below.

Assigning shard IDs to nodes allows the system to identify each unique node when assigning parts of the aggregation. If the shard IDs are not assigned, system uses the unique node IDs (defined in step 1) for this purpose.

For node 1:

siddhi:
  properties:
    partitionById: true
    shardId: wso2-sp-analytics-1

For node 2:
```
siddhi:
  properties:
    partitionById: true
    shardId: wso2-sp-analytics-2
```
- To maintain data consistency, do not change the shard IDs after the first configuration
- When you enable the aggregation partitioning feature, a new column ID named SHARD_ID is introduced to the aggregation tables. Therefore, you need to do one of the following options after enabling this feature to avoid errors occuring due to the differences in the table schema.
  - Delete all the aggregation tables for SECONDS, MINUTES, HOURS, DAYS, MONTHS, YEARS.
  - Edit the aggregation tables by adding a new column named SHARD_ID, and specify it as a primary key.

Connecting to databases

Open the deployment.yaml file (stored in the <EI_HOME>/wso2/analytics/conf/worker/ directory) and update the datasource configurations (under the wso2.datasources section) for the WSO2_CLUSTER_DB, EI_ANALYTICS, and PERSISTENCE_DB databases.

Given below are sample configurations for the MySQL databases. If a datasource configuration already exists with the same name, be sure to replace it with the following samples.

Configuring storage persistence

To allow the two worker nodes to use the same database for state persistence, update the state.persistence section in the deployment.yaml file with the following properties.

Parameter	Description
`enabled`	To enable state persistence, set this property to true.
`persistenceStore`	State persistence can be configured to use a db-based or file-based storage. However, it is recommended to use a DB-based storage. By default, a file-based storage is configured as the persistence store. To configure a DB-base persistence store, enter the following persistence store: org.wso2.carbon.stream.processor.core.persistence.DBPersistenceStore Be sure that the same persistence storage is shared between the two worker nodes.
`config:` `datasource:` `table:`	Give the name of the database that is configured for persistence (which is the `PERSISTENCE_DB` database explained above) as the datasource, and the name of the database table: config: datasource: PERSISTENCE_DB table: PERSISTENCE_TABLE

Starting the cluster

Before you begin:

If you are starting both the worker nodes in a single machine, be sure to apply a port offset for one of the worker nodes:

Open the deployment.yaml file (stored in the <EI_HOME>/wso2/analytics/conf/worker/ directory).
In the wso2.carbon section, and set a port offset value. By default, the offset is set to 0:
```
# port offset
    offset: 0
```

In the wso2.transport.http section, under listenerConfigurations , increment the default port values (given below):

Default configuration

listenerConfigurations:
    -
      id: "default"
      host: "0.0.0.0"
      port: 9091
    -
      id: "msf4j-https"
      host: "0.0.0.0"
      port: 9444
      scheme: https
      keyStoreFile: "${carbon.home}/resources/security/wso2carbon.jks"
      keyStorePassword: wso2carbon
      certPass: wso2carbon

In the siddhi.stores.query.api section, under listenerConfigurations, increment the default port values (given below):

Default configuration

listenerConfigurations:
    -
      id: "default"
      host: "0.0.0.0"
      port: 7070
    -
      id: "msf4j-https"
      host: "0.0.0.0"
      port: 7443
      scheme: https
      keyStoreFile: "${carbon.home}/resources/security/wso2carbon.jks"
      keyStorePassword: wso2carbon
      certPass: wso2carbon

Start the two worker nodes of WSO2 EI Analytics by executing the following command:

If the cluster is correctly configured, the following CLI log can be viewed without any error logs:

INFO {org.wso2.carbon.stream.processor.core.internal.ServiceComponent} - WSO2 Stream Processor Starting in Two Node Minimum HA Deployment
INFO {org.wso2.carbon.stream.processor.core.ha.HAManager} - HA Deployment: Starting up as Active Node

Start the Analytics dashboard by executing the following command: