Data analytics refer to aggregating, analyzing and presenting information about business activities. This definition is paramount, when designing a solution to address a data analysis use case. Aggregation refers to the collection of data, analysis refers to the manipulation of data to extract information, and presentation refers to representing this data visually or in other ways such as alerts. Data which you need to be monitor or process sequentially go through these modules.
The WSO2 DAS architecture reflects this natural flow in its very design as illustrated below.
The WSO2 DAS architecture consists of the following components as described below.
Data Agents
Data Agents are external sources which publishes data to WSO2 DAS. Data Agent components reside in external systems and pushes data as events to the DAS. For more information on Data Agents, see /wiki/spaces/PS/pages/10519883.
Event Receivers
WSO2 DAS receives data published by Data Agents through Event Receivers. Each event receiver is associated with an event stream, which you then persist and/or process using Event Processors. There are many transport types supported as entry protocols for Event Receivers in WSO2 DAS. For more information on Event Receivers, see Configuring Event Receivers.
Analytics REST API
In addition to the Event Receivers, WSO2 DAS facilitates REST API based data publishing. You can use the REST API with Web applications and Web services. For more information on REST API, see Analytics REST API Guide.
Data Store
WSO2 DAS supports Data Stores (Cassandra, HBase, and RDBMS etc.) for data persistence. There are three main types of Data Stores, namely Event Store, Processed Event Store, and File store. Event Store is used to store events that are published directly to the DAS. Processed Event Store is used to store resulting data processed using Apache Spark analytics. File store is used for storing Apache Lucene indexes. For more information on Data Stores, see DAS Data Access Layer.
Analytics Spark
Main analytics engine of WSO2 DAS is based on Apache Spark. This is used to perform batch analytics operations on the data stored in Event Stores using analytics scripts written in Spark SQL. For more information on Spark analytics, see Data Analysis.
Data Indexing
Data Indexing is a periodically running process which updates the Lucene indexes for the indexed fields in the Event Store configurations of an event stream.
Siddhi Event Processors
WSO2 DAS uses a realtime event processing engine which is based on Siddhi. For more information on realtime analytics using Siddhi, see Working with Execution Plans.
Event Publishers
Output data either from Spark scripts or the Siddhi CEP engine are published from the DAS using event publishers. Event Processors support various transport protocols. For more information on Event Publishers, see Publishing Events.
Analytics Dashboard
Analytics Dashboard is used for data visualization in WSO2 DAS. It consists of several dashboards each with a set of gadgets. You can use either data from Data Store or from a realtime event stream as the source of data for each gadget.
Event Sinks
Event sinks are the components outside the DAS. Event Publishers send various event notifications to Event Sinks.
WSO2 DAS event flow
The event flow of WSO2 DAS is as follows.
- Event Receivers and the analytics REST API send data to the DAS server.
- Received data are stored through the data layer in the underlying Data Store (Cassandra, RDBMS or HBase etc.).
- A background data indexing process fetches the data from the Data Store, and does the indexing operations.
- Analyzer engine, which is powered by Apache Spark or the realtime Siddhi based Event Processors analyze this data according to defined analytic queries. This usually follows a pattern of retrieving data from the Data Store, performing a data operation such as an addition, and storing data back in the Data Store.
- The Analytics Dashboard queries the Data Store for the analyzed data and displays them graphically.