Warning |
---|
This section is still a work in progress. |
The embedded Spark server in WSO2 ESB EI Analytics Server can be used in several deployment modes, depending on your requirement.
Mode | Description | When to use |
---|---|---|
Local (default) | In this mode, all of the Spark related work is done within a single node/JVM. | This is ideally suited for evaluation purposes and testing Spark queries in ESB EI Analytics. |
Cluster (recommended) | ESB EI Analytics creates its own Spark cluster in the Carbon environment (using Hazelcast). This mode can be used with several high availability (HA) clustering patterns to handle failover scenarios. Additionally, in the Cluster mode, ESB EI Analytics can be setup without a Spark application. This allows other components to use the ESB EI Analytics cluster as an external Spark cluster. | For clustered production setups. |
Client | In this mode, ESB EI Analytics acts only as a Spark client pointing to a separate Spark master. | This is suited to scenarios where you want to submit ESB EI analytics jobs to an external Spark cluster. |
...
This is the default mode for a typical ESB EI Analytics instance. This mode enables users to evaluate Spark analytics in the ESB EI Analytics Server. In this mode, a separate master or worker is not spawned. Instead, everything would run on a single JVM. Therefore, certain options like Spark Master UI and Spark Worker UI are not active.
...
- Ensure that Carbon clustering is disabled. To do this, open the
<ESB<EI-ANALYTICS_HOME>/repository/conf/axis2/axis2.xml
file and setenable=”false”
as shown below.
<clustering class="org.wso2.carbon.core.clustering.hazelcast.HazelcastClusteringAgent" enable="false">
- Set the Spark master to local. To do this, open the
<ESB<EI-ANALYTICS_HOME>/repository/conf/analytics/spark/spark-defaults.conf
file and add the following entry (unless it already exists).
carbon.spark.master local[<number of cores>]
...
Cluster mode is the recommended deployment pattern for ESB EI Analytics in the production environment. Here, ESB EI Analytics creates its own Spark cluster using the Carbon environment and Hazelcast. In this clustering approach, the Spark Standalone mode is used along with a custom implementation of the Standalone Recovery Mode API in Spark.
...
- Enable Carbon clustering. To do this, in the
<ESB<EI-ANALYTICS_HOME>/repository/conf/axis2/axis2.xml
set enable=”true” for clustering as shown below.
<clustering class="org.wso2.carbon.core.clustering.hazelcast.HazelcastClusteringAgent" enable="true">
- In the
<ESB<EI-ANALYTICS_HOME>/repository/conf/analytics/spark/spark-defaults.conf
file, set the number of masters in the cluster by adding the following entry.
carbon.spark.master.count <number of masters in the cluster>
...
Client mode
Client mode is where ESB EI Analytics submits all the Spark related jobs to an external Spark cluster. Since this uses an external Spark cluster, you must ensure that all the .jar
files required by the Carbon Spark App are included in the SPARK_CLASSPATH
of the Spark master and worker.
Do the following to configure client mode.
- In the
<ESB<EI-ANALYTICS_HOME>/repository/conf/analytics/spark/spark-defaults.conf
file, add the following entry.
carbon.spark.master spark://<host1:port1, host2:port2, ...>
...
In addition to the above modes, you can also configure ESB EI Analytics to startup without a Spark application. Up until the current Spark version (version 1.4.0), there can only be one active Sparkcontext inside a single JVM. Therefore, it is not possible to allow multiple Spark applications to be created in a single JVM. Furthermore, by default, applications submitted to the standalone mode cluster run in FIFO (first-in-first-out) order, and each application attempts to use all available nodes. The Carbon Spark application used for ESB EI Analytics runs throughout the lifetime of the ESB EI Analytics cluster. Therefore, even if you create a separate Spark application in a different JVM, it can only use the resources in the cluster when the Carbon Spark application is terminated.
In order to allow other clients to use the ESB EI Analytics Spark cluster, there is an option provided to disable this Carbon Spark application. This can be done by setting a system variable in the server startup. See Disabling DAS components in the DAS documentation for more information.