Fully Distributed Deployment
Introduction
The most common deployment pattern for WSO2 SP is the Minimum High Availability Deployment that offers high availability with the minimum amount of resources. However, there are a few user scenarios where the HA (High Availability) deployment is not sufficient to handle the throughput.
The Distributed Deployment pattern is supported so that a high volume of data can be distributed among multiple SP instances instead of having them accumulated at a single point. It is suitable to be used in scenarios where the volume of data handled is too high to be managed in a single SP instance or a minimum high availability deployment.
Distributed Siddhi applications
A Siddhi Application is a combination of multiple Siddhi executional elements. A Siddhi executional element can be a Siddhi Query or a Siddhi Partition. In distributed processing perspective, a collection of these execution elements is called an execution group. Execution group is the smallest unit of execution.
Distributed processing of a Siddhi application allows users to execute multiple instances of each execution group in-parallel in multiple SP instances.
Users can specify execution groups and the parallelism to execute them by annotating existing Siddhi applications. Following sample application is annotated in that manner.
@App:name('wso2-app')
@info(name = ‘query-1')
@dist(execGroup='group-1')
from TempStream#window.time(2 min)
select avg(temp) as avgTemp, roomNo, deviceID
insert all events into AvgTempStream;
@info(name = ‘query-2')
@dist(execGroup='group-1')
from every( e1=TempStream ) ->
e2=TempStream[e1.roomNo==roomNo and (e1.temp + 5) <= temp ] within 10 min
select e1.roomNo, e1.temp as initialTemp, e2.temp as finalTemp
insert into AlertStream;
@info(name = ‘query-3')
@dist(execGroup='group-2' ,parallel ='2')
from TempStream [(roomNo >= 100 and roomNo < 110) and temp > 40 ]
select roomNo, temp
insert into HighTempStream;This sample disributed Siddhi application contains two execution groups named group-1 and group-2 (defined via execGroup='<GROUP_ID>' e.g., execGroup='group-1'). group-1 contains two queries named query-1 and query-2. group-2 contains query-3. No specific number of parallel instances are specified for group-1. Therefore, only one instance is created for it at runtime by default. Two parallel instances are specified for group-2.
The following is an illustration of how each parallel instance is created as a separate Siddhi application.
Each Siddhi application is deployed in the available resource nodes of the distributed cluster. All these Siddhi applications communicate with each other via the messaging layer. The system has the ability to interact with the messaging layer and create topics representing each stream, and it configures the Siddhi applications to use these topics as required.
For detailed information, see Converting to a Distributed Streaming Application.
Deployment architecture
WSO2 Stream Processor has a component named Dashboard in the User Interface and Dashboard layer. The Dashboard allows users to view the output of analytics in an interactive manner. It also conveys observability information the cluster, the status of the list of the applications (i.e., Siddhi applications) currently submitted, and the status of each Stream Processor node. The JVM metrics, as well as Siddhi application level metrics, can be viewed through this dashboard.
Job Manager nodes handle all the Management layer related functionalities. This layer contains two WSO2 SP Manager instances configured to run in high availability mode. Here, the Manager parses the distributed Siddhi application provided by the user, partitions it into multiple Siddhi applications, wires them using messaging layer topics, and deploys them in the available worker nodes. Management layer also handles the effects of the worker nodes joining/leaving the distributed cluster by re-distributing the Siddhi applications accordingly.
The processing layer (also known as the resource cluster) is represented by multiple WSO2 SP Worker instances that are configured as workers. Each WSO2 SP worker instance in this layer registers itself to the Manager Cluster when it starts. These workers periodically send their heartbeats to the Manager Cluster. This allows the Managers to identify the active worker nodes and the inactive ones. The worker nodes (resource nodes) run the Siddhi applications assigned to them by their Manager nodes. In addition, they are also capable of handling network partitions in a graceful manner as depicted in the following diagram.
As depicted above, a worker node periodically synchronizes its configurations and the Siddhi applications with the manager Node. If the network gets partitioned or if the manager becomes unreachable, it undeploys the applications deployed in it. By doing so, it allows the Siddhi applications to be rescheduled in other work nodes that are maintaining their connections with the manager nodes.
It is required to use Apache Kafka or NATS as the messaging layer to configure a fully distributed SP cluster. Persistence stores of the Persistence layer can be RDBMS databases that store both configuration and system state data. Identity and access management of all the WSO2 Stream Processor nodes can be handled by any SCIM supported Identity provider such as the WSO2 Identity and Access Management(WSO2 IAM).
There are no restrictions to run WSO2 Stream Processor in the distributed mode on any environment. It can run in the distributed mode on bare metal, VMs, and containers. Here the manager nodes are grouped in a single cluster backed by a database for correlation. Similarly, dashboard nodes can also be deployed in a separate cluster. The worker nodes, on the other hand, are not aware of each other. They are synchronized with manager nodes from which they receive instructions.
Manager cluster
The manager cluster contains two or more WSO2 SP instances configured to run in the high availability mode. The manager cluster is responsible for parsing a user-defined distributed Siddhi application, dividing it to multiple Siddhi applications, creating the required topics and then deploying them in the available resource nodes. The manager cluster also handles resource nodes that join/leave the distributed cluster, and re-schedules the Siddhi applications accordingly. Since manager nodes are deployed in a high availability mode, if and when the active manager node goes down, another node in the manager will be elected as the cluster to handle the resource cluster.
Resource cluster
A resource cluster contains multiple WSO2 SP instances. Each instance sends a periodic heartbeat to the manager cluster so that the managers at any given time can identify the resource nodes that are active in the cluster. The resource nodes are responsible for running Siddhi applications assigned to them by the manager nodes. A resource node continues to run its Siddhi applications until a manager node undeploys them, or until it is no longer able to reach a manager node to send its heartbeat. If a manager node is unreachable for a specified amount of time, the resource node stops operating, removes its deployed Siddhi applications and waits until it can reach a manager node again.
The resource cluster can include both receiver workers and resource workers. You can specify the minimum number of receiver worker nodes to be included. However, you need to ensure that the minimum number specified is greater than one. This is because, if one or more distributed Siddhi applications contain a user-defined source such as HTTP or Thrift, then that Siddhi application cannot be deployed in a resource worker node. Therefore, at least one receiver worker node needs to be available in the resource cluster to ensure that distributed Siddhi applications are successfully deployed.
Deployed Siddhi applications communicate among themselves via the messaging layer.
Kafka cluster
It is required to insrtall Kafka and Zookeeper to configure a fully distributed deployment.
A Kafka cluster holds all the topics used by distributed Siddhi applications. All communications between execution groups take place via Kafka.
Publishing and receiving data from distributed Siddhi applications can be done via Kafka or other Siddhi sources as follows:
Via Kafka
To use Kafka for publishing and receiving data, you can either define a Kafka source in the initial distributed Siddhi application or use the Kafka source created by the distributed implementation.Via Other Siddhi Sources
This invoves definingh the source in the initial distributed Siddhi application.
Messaging Cluster
It is required to install either Kafka or NATS broker as the messaging layer to configure a fully distributed deployment.
Messaging cluster holds all the topics used by distributed Siddhi applications for communications between the nodes in it. Publishing and receiving data from distributed Siddhi applications can also be done via the same messaging cluster. To use the messaging layer as the event entry point, you need to define sources based on the type of the messaging layer. Then the generated partial Siddhi applications consume from those predefined topics. Topics to send data across the generated partial applications are automatically created.
Configuring a distributed cluster
This section explains how to configure a distributed WSO2 SP cluster.
Prerequisites
In order to configure a fully distributed HA cluster, the following prerequisites must be completed:
A WSO2 SP binary pack must be available for each node in the cluster.
Each SP node must have a distinct ID under wso2.carbon in the
<SP_HOME>/conf/manager/deployment.yamlor<SP_HOME>/conf/worker/deployment.yamlfile depending on the cluster node being configured.A working RDBMS instance to be used for clustering of the manager nodes. Currently H2, MySQL, Oracle, Postgre and MSSQL databases are supported.
Add the database driver corresponding with the used DB system to the
<SP_HOME>/libdirectory.The messaging cluster based on Kafka or NATS must be started, and the host and ports of the cluster must be known. You also need a ZooKeeper cluster to facilitate the Kafka cluster. The following versions of each product is supported.
Zookeeper version: 3.4.6
Kafka version: 2.11-0.10.0.0
NATS streaming server version: 0.11.x
The following tasks need to be carried out depending on the messaging layer in order to make WSO2 SP compatible with the messaging layer.
Converting JARs to OSGi bundle
To convert jars to OSGi bundles, follow the steps below:
Create the source directory (e.g., named
jars) and copy the required JARs into the created directory.Create another directory (e.g., named
osgi). This is the destination directory to which the converted OSGi bundles are to be added.To convert the JARs, navigate to the
<SP_HOME>/bindirectory and issue one of the following commands.For Linux:
./jartobundle.sh <path_to_source_directory> <path_to_destination_directory>For Windows:
./jartobundle.bat <path_to_source_directory> <path_to_destination_directory>
If the JARs are successfully converted, the following message appears in the terminal.
The converted OSGi bundles are now available in the destination directory. Copy them and place them in the
<SP_HOME>/libdirectory.