The embedded Spark server in WSO2 Data Analytics Server can be used in several deployment modes, depending on your requirement.
Mode | Description | When to use |
---|---|---|
Local (default) | In this mode, all of the Spark related work is done within a single node/JVM. | This is ideally suited for evaluation purposes and testing Spark queries in DAS. |
Cluster (recommended) | DAS creates its own Spark cluster in the Carbon environment (using Hazelcast). This mode can be used with several high availability (HA) clustering patterns to handle failover scenarios. Additionally, in the Cluster mode, DAS can be setup without a Spark application. This allows other components to use the DAS cluster as an external Spark cluster. For example, WSO2 Machine Learner can use the Spark cluster embedded in WSO2 DAS. | For clustered production setups. |
Client | In this mode, DAS acts only as a Spark client pointing to a separate Spark master. | This is suited to scenarios where you want to submit DAS analytics jobs to an external Spark cluster. |
The following topics list out the configuration instructions for the different deployment modes and also provide instructions on disabling Spark applications.
Note | ||
---|---|---|
| ||
It is recommended to secure your Spark environment by restricting access to the Spark UIs. This can be achieved by addiong the 4040 and 8081 ports to the block list when configuring your firewalls. This blocks the external users from accessing the Spark UI and viewing information related to your production environment. |
Table of Contents | ||||
---|---|---|---|---|
|
Local mode
This is the default mode for a typical DAS instance. This mode enables users to evaluate Spark analytics in the Data Analytics Server. In this mode, a separate master or worker is not spawned. Instead, everything would run on a single JVM. Therefore, certain options like Spark Master UI and Spark Worker UI are not active.
...
- Ensure that Carbon clustering is disabled. To do this, open the
<DAS_HOME>/repository/conf/axis2/axis2.xml
file and setenable=”false”
as shown below.
<clustering class="org.wso2.carbon.core.clustering.hazelcast.HazelcastClusteringAgent" enable="false">
- Set the Spark master to local. To do this, open the
<DAS_HOME>/repository/conf/analytics/spark/spark-defaults.conf
file and add the following entry (unless it already exists).
carbon.spark.master local[<number of cores>]
Cluster mode
Anchor | ||||
---|---|---|---|---|
|
Cluster mode is the recommended deployment pattern for DAS in the production environment. Here, DAS would create its own Spark cluster using the Carbon environment and Hazelcast. In this clustering approach, the Spark Standalone mode is used along with a custom implementation of the Standalone Recovery Mode API in Spark.
...
- Ensure that the “
carbon.spark.master local
” configuration remains unchanged. This acts as a flag to use Carbon clustering. - Each node can start as both a master and worker. So, in a two node cluster there would be two masters and two workers, one of the master nodes is active and the other is passive. You must specify the total number of masters in the cluster based on your requirement.
Client mode
Client mode is where DAS submits all the Spark related jobs to an external Spark cluster. Since this uses an external Spark cluster, you must ensure that all the .jar files required by the Carbon Spark App are included in the Spark master's and worker's SPARK_CLASSPATH.
...
You can include a single master or a list of masters to ensure high-availability.
Disabling a Spark application
In addition to the above modes, you can also configure DAS to startup without a Spark application. Up until the current Spark version (version 1.4.0), there can only be one active Sparkcontext inside a single JVM. Therefore, it is not possible to allow multiple Spark applications to be created in a single JVM. Furthermore, by default, applications submitted to the standalone mode cluster run in FIFO (first-in-first-out) order, and each application attempts to use all available nodes. The Carbon Spark application used for DAS analytics runs throughout the lifetime of the DAS cluster. Therefore, even if you create a separate Spark application in a different JVM, it can only use the resources in the cluster when the Carbon Spark application is terminated.
...