Deployment and Clustering
The topics in this section provide instructions on how to deploy and configure clustering for WSO2 Data Analytics Server (DAS). The configurations change depending on the deployment pattern you choose. The following are the possible deployment patterns available for WSO2 DAS.
Deployment patterns
The following are the main deployment patterns available. Note that not all of these are patterns recommended for production environments. The reasoning behind the recommendation is provided for each pattern. See Spark Deployment Patterns for deployment patterns related to the embedded Apache Spark that comes with the WSO2 Data Analytics Server.
Single node deployment
WSO2 Data Analytics Server can be deployed as a single instance in a server. This is not recommended in a typical production environment as it does not offer high availability.
High availability deployment
WSO2 Data Analytics Server can be deployed with high availability by having a minimum of two instances in the production environment. In this instance, both nodes are active and hot. What this means is that if one node fails the other can take over without any hesitation and they both have the ability to serve requests. The embedded Apache Spark in the WSO2 DAS instance works as though it is clustered with the embedded Apache Spark in the other WSO2 DAS instance.
Note: You can have more than two nodes if you want to ensure that there is more availability in case of failure. However, when there are too many nodes in the cluster, there is an impact on the performance of the cluster. This occurs because it may take some time to locate members of the cluster if there are many.
Â
Fully distributed deployment
WSO2 Data Analytics Server can be deployed by distributing the tasks to various instances and clustering these instances for high availability.
Note: You can have more than two nodes if you want to ensure that there is more availability in case of failure. However, when there are too many nodes in the cluster, there is an impact on the performance of the cluster. This occurs because it may take some time to locate members of the cluster if there are many.
The following table describes these distributed components.
Distributed component | Minimum number of nodes | Description |
---|---|---|
Receiver nodes | 2 | For data analytics to happen, it is necessary to first collect the relevant data you require. DAS provides data agents that capture information on the messages flowing through the WSO2 ESB, WSO2 Application Server, and other products that use the DAS data publisher. The information is obtained by the data receivers and is then stored in a datastore, where it is optimized for analysis. The receiver nodes are used to obtain this data from the data agents. |
Indexer nodes | 2 | A background indexing process fetches the data from the datastore and does the indexing operations. These operations are handled by the indexer nodes in a fully distributed, highly available system. |
Analyzer (Spark) nodes | 2 | The analyzer engine, which is powered by Apache Spark, analyzes this data according to defined analytic queries. This will usually follow a pattern of retrieving data from the datastore, performing a data operation such as an addition, and storing the data back in the datastore. The analyzer operations are performed by the analyzer nodes. |
Dashboard nodes | 1 | The dashboard sends queries to the datastore for the analyzed data and displays them graphically. This function can be distributed to the dashboard nodes. |
Analyzer node with external Spark or YARN cluster
WSO2 Data Analytics Server can be deployed as an Analyzer node with an external Spark or YARN cluster. This is suitable for scenarios where you want to submit WSO2 DAS analytics jobs to an external Spark cluster.
High availability analyzer cluster with external Spark or YARN cluster
WSO2 Data Analytics Server can be deployed as an Analyzer cluster for high availability with an external Spark or YARN cluster. This is suitable for scenarios where you want to submit WSO2 DAS analytics jobs to an external Spark cluster.