Clustering the Analytics Profile
You can deploy the Analytics profile of WSO2 Enterprise Integrator (WSO2 EI) to facilitate high availability (HA). This will prevent unexpected failover scenarios. The recommended HA deployment uses two instances of WSO2 EI as worker nodes and one WSO2 EI instance for the dashboard. The two worker nodes will publish the statistics to the EI_Analytics database, and the dashboard node will read the data from this database and display the statistics on the dashboard.
The recommended deployment for EI Analytics is the Active-Active deployment pattern. The active-active pattern is a highly scalable deployment pattern. For an overview of the Active-Active deployment pattern and instructions to configure it, see the following topics.
Overview
The above diagram represents a deployment where you are not limited to two nodes. You can scale the analytics setup by adding more EI Analytics nodes to the deployment. In this deployment, it is recommended to configure the EI node to publish events to multiple EI Analytics nodes in a Round Robin manner to ensure better fault tolerance. And the same event should NOT be duplicated to multiple analytics nodes.
Active-Active deployment pattern utilizes distributed aggregations in order to perform analytics in a scalable manner. Distributed aggregations allow multiple nodes to write data to the same aggregation parallelly. This allows you to deploy any number of nodes to process a single aggregation and thereby avoid performance bottlenecks. In this setup, all EI analytics nodes must share the same EI_ANALYTICS DB.
To understand how an active-active cluster processes aggregations when aggregations are partitioned and assigned to different nodes, consider the following Siddhi query which defines an aggregation called EIStatsAgg and let's assume this aggregation is processed in a distributed manner.
define stream PreProcessedESBStatStream(componentType string, componentID string, requestTimestamp long);
@store(type = 'rdbms', datasource = 'EI_ANALYTICS')
define aggregation EIStatsAggfrom PreProcessedESBStatStream
select componentType, componentID, count() as totalRequestCount
group by componentType, componentID
aggregate by requestTimestamp every seconds...years;
The above query addresses a simple use case which involves calculating the total request count for different types of ESB components (i.e., different proxies, REST APIs, sequences, mediators, etc.). Each request received by EI will publish an event which contains various information about pertaining to the request, and the information captured includes the component ID, the component type, and the timestamp to which the information applies. When an analytics node receives such an event, it will be passed into the aggregation, perform the required calculations and stores this information in the EI_ANALYTICS
data store defined in the <EI_HOME>/wso2/analytics/conf/worker/deployment.yaml
file.
Now let's assume that during a specific hour, the EI node publishes 30,000 events to analytics-node-1, and 40,000 events to analytics-node-2 for a proxy named JMSProxy
. When you retrieve the total request count during that hour for the JMSProxy proxy via a retrieval query, the result is 70,000.
The steps to enable aggregation partitioning are provided under Configuring an active-active cluster.
Setting up the databases
To create the databases, do the following:
These instructions assume you are installing MySQL as your relational database management system (RDBMS), but you can install another supported RDBMS as needed .
It is recommended to use an industry-standard RDBMS such as Oracle, PostgreSQL, MySQL, MS SQL, etc. for most enterprise testing and production environments.
Download and install MySQL Server.
Download the MySQL JDBC driver.
Unzip the downloaded MySQL driver zipped archive, and copy the MySQL JDBC driver JAR (
mysql-connector-java-x.x.xx-bin.jar
) to the<EI_HOME>/wso2/analytics/lib
directory of all the WSO2 EI nodes.As depicted in the above diagram, the following databases should be created for the Analytics cluster:
Database Name Description WSO2_DASHBOARD_DB
This database is required by the dashboard to store various data related to the dashboard. WSO2_PERMISSIONS_DB
This database is required by the dashboard to store permissions that are granted to the users of the Analytics dashboard. EI_ANALYTICS
This database persists the mediation statistics that are published to the worker nodes in the cluster. The dashboard node will access the statistics from this database. WSO2_CLUSTER_DB
This database stores the cluster coordination data from the two worker nodes in the cluster. PERSISTENCE_DB
This database periodically persists information about the status of the worker nodes. Create a table named PERSISTENCE_TABLE
.
Configuring the Analytics dashboard
The Analytics dashboard is stored as a sub profile of the Analytics profile in WSO2 EI. You can configure a separate WSO2 EI node as the dashboard for both the workers nodes. You simply need to connect the dashboard to the required databases. The dashboard will read the statistics (published from the worker nodes) from in the EI_Analytics
database and display the statistics on the dashboard.
Open the deployment.yaml
file (stored in the <EI_HOME>/wso2/analytics/conf/dashboard/
directory) and update the datasource configurations (under the wso2.datasources
section) for the WSO2_DASHBOARD_DB
, EI_ANALYTICS
, and WSO2_PERMISSIONS_DB
databases.
Given below are sample configurations for the MySQL databases. If a datasource configuration already exists with the same name, be sure to replace it with the following samples.
Configuring the Analytics worker nodes
The following configurations need to be done in the deployment.yaml
file stored in the <EI_HOME>/wso2/analytics/conf/worker/
directory for both the worker nodes.
Configuring the worker node IDs
For each node, enter a unique ID for the id
property under the wso2.carbon
section. (e.g., id: wso2-sp
). This is used to identify each node within a cluster.
edit the <EI_HOME>/wso2/analytics/conf/worker/deployment.yaml
file as follows:
For example, you can add IDs as shown below.
For node 1:
wso2.carbon: id: wso2-ei-analytics-1
For node 2:
wso2.carbon: id: wso2-ei-analytics-2
Enable partitioning aggregations for each node, and assign a unique shard ID for each node. To do this, set the partitionById
and shardId
parameters as Siddhi properties as shown below.
Assigning shard IDs to nodes allows the system to identify each unique node when assigning parts of the aggregation. If the shard IDs are not assigned, system uses the unique node IDs (defined in step 1) for this purpose.
For node 1:
siddhi: properties: partitionById: true shardId: wso2-sp-analytics-1
For node 2:
siddhi: properties: partitionById: true shardId: wso2-sp-analytics-2
- To maintain data consistency, do not change the shard IDs after the first configuration
- When you enable the aggregation partitioning feature, a new column ID named
SHARD_ID
is introduced to the aggregation tables. Therefore, you need to do one of the following options after enabling this feature to avoid errors occuring due to the differences in the table schema.- Delete all the aggregation tables for
SECONDS
,MINUTES
,HOURS
,DAYS
,MONTHS
,YEARS
. - Edit the aggregation tables by adding a new column named
SHARD_ID
, and specify it as a primary key.
- Delete all the aggregation tables for
Connecting to databases
Open the deployment.yaml
file (stored in the <EI_HOME>/wso2/analytics/conf/worker/
directory) and update the datasource configurations (under the wso2.datasources
section) for the WSO2_CLUSTER_DB
, EI_ANALYTICS
, and PERSISTENCE_DB
databases.
Given below are sample configurations for the MySQL databases. If a datasource configuration already exists with the same name, be sure to replace it with the following samples.
Configuring storage persistence
To allow the two worker nodes to use the same database for state persistence, update the state.persistence section in the deployment.yaml file with the following properties.
Parameter | Description |
---|---|
enabled | To enable state persistence, set this property to true. |
persistenceStore | State persistence can be configured to use a db-based or file-based storage. However, it is recommended to use a DB-based storage. By default, a file-based storage is configured as the persistence store. To configure a DB-base persistence store, enter the following persistence store: org.wso2.carbon.stream.processor.core.persistence.DBPersistenceStore Be sure that the same persistence storage is shared between the two worker nodes. |
datasource: table: | Give the name of the database that is configured for persistence (which is the config: datasource: PERSISTENCE_DB table: PERSISTENCE_TABLE |
Starting the cluster
Before you begin:
If you are starting both the worker nodes in a single machine, be sure to apply a port offset for one of the worker nodes:
- Open the
deployment.yaml
file (stored in the<EI_HOME>/wso2/analytics/conf/worker/
directory). In the
wso2.carbon
section, and set a port offset value. By default, the offset is set to 0:# port offset offset: 0
In the
wso2.transport.http
section, underlistenerConfigurations
, increment the default port values (given below):In the
siddhi.stores.query.api
section, underlistenerConfigurations
, increment the default port values (given below):
Start the two worker nodes of WSO2 EI Analytics by executing the following command:
If the cluster is correctly configured, the following CLI log can be viewed without any error logs:
INFO {org.wso2.carbon.stream.processor.core.internal.ServiceComponent} - WSO2 Stream Processor Starting in Two Node Minimum HA Deployment INFO {org.wso2.carbon.stream.processor.core.ha.HAManager} - HA Deployment: Starting up as Active Node
Start the Analytics dashboard by executing the following command: