Converting to a Distributed Streaming Application
A Siddhi Application is a combination of multiple Siddhi executional elements. A Siddhi executional element can be a Siddhi Query or a Siddhi Partition. When defining a Siddhi application, you can specify a number of parallel instances to be created for each executional element, and how each executional element must be isolated for an SP instance. Based on this, the initial Siddhi application is divided into multiple Siddhi applications and deployed in different SP instances.
This deployment pattern is supported so that a high volume of data can be distributed among multiple SP instances instead of having them accumulated at a single point. Therefore, it is suitable to be used in scenarios where the volume of data handled is too high to be managed in a single SP instance.
Creating a distributed Siddhi application
This section explains how to write distributed Sidhi applications by assigning executional elements to different execution groups.
Executional elements
A distributed Siddhi application can contain one or more of the following elements:
Element | Description |
---|---|
Stateless queries | Queries that only consider currently incoming events when generating an output. e.g., Filters |
Stateful queries | Queries that consider both currently incoming events as well as past events when generating an output. e.g., windows, sequences, patterns, etc. |
Partitions | Collections of stream definitions and Siddhi queries separated from each other within a Siddhi application for the purpose of processing events in parallel and in isolation. |
Annotations
The following annotations are used when writing a distributed Siddhi application.
Annotation | Description |
---|---|
@dist(execGroup='name of the group') | All the executional elements with the same execution group are executed in the same Siddhi application. When different execution groups are mentioned within the same distributed Siddhi application, WSO2 SP initiates a separate Siddhi Application per execution group. In each separated Siddhi application, only the executional elements assigned to the relevant execution group are executed. Executional elements that have no execution group assigned to them are executed in a separate SP instance. |
@dist (parallel='number of parallel instances’) | The number of instances in which the executional element must be executed in parallel. All the executional elements assigned to a specific execution group (i.e., via the When the number of parallel instances to be run is not given for the executional elements assigned to an execution group , only one Siddhi application is initiated for that execution group. This can also be applied to sources. If a parallelism count is specified within a source annotation, a number of passthrough Siddhi applications equal to that count are generated and deployed. |
Example
The following is a sample distributed Siddhi application.
@App:name('Energy-Alert-App') @App:description('Energy consumption and anomaly detection') @source(type = 'http', topic = 'device-power', @map(type = 'json'), @dist(parallel='2')) define stream DevicePowerStream (type string, deviceID string, power int, roomID string); @sink(type = 'email', to = '{{autorityContactEmail}}', username = 'john', address = 'john@gmail.com', password = 'test', subject = 'High power consumption of {{deviceID}}', @map(type = 'text', @payload('Device ID: {{deviceID}} of room : {{roomID}} power is consuming {{finalPower}}kW/h. '))) define stream AlertStream (deviceID string, roomID string, initialPower double, finalPower double, autorityContactEmail string); @info(name = 'monitered-filter')@dist(execGroup='001') from DevicePowerStream[type == 'monitored'] select deviceID, power, roomID insert current events into MonitoredDevicesPowerStream; @info(name = 'power-increase-pattern')@dist(parallel='2', execGroup='002') partition with (deviceID of MonitoredDevicesPowerStream) begin @info(name = 'avg-calculator') from MonitoredDevicesPowerStream#window.time(2 min) select deviceID, avg(power) as avgPower, roomID insert current events into #AvgPowerStream; @info(name = 'power-increase-detector') from every e1 = #AvgPowerStream -> e2 = #AvgPowerStream[(e1.avgPower + 5) <= avgPower] within 10 min select e1.deviceID as deviceID, e1.avgPower as initialPower, e2.avgPower as finalPower, e1.roomID insert current events into RisingPowerStream; end; @info(name = 'power-range-filter')@dist(parallel='2', execGroup='003') from RisingPowerStream[finalPower > 100] select deviceID, roomID, initialPower, finalPower, 'no-reply@powermanagement.com' as autorityContactEmail insert current events into AlertStream; @info(name = 'internal-filter')@dist(execGroup='004') from DevicePowerStream[type == 'internal'] select deviceID, power insert current events into InternaltDevicesPowerStream;
When above siddhi application is deployed it creates a distributed processing chain as depicted in the image below.
As annotated in the Siddhi application, two passthough query groups are created to accept HTTP traffic and to send those events into the messaging layer. Other execution groups are created as per the given parallelism count. The execution group creation is summarized in the table below.
Execution Group | Number of Siddhi Application Instances | Queries executed |
---|---|---|
| 1 | monitered-filter |
| 2 | power-increase-pattern |
| 2 | power-range-filter |
004 | 1 | internal-filter |