Page Comparison

The following deployment patterns are supported for WSO2 ML.

...

Value	Description
local	Runs Spark locally with one worker thread. There will be no multiple threads running in parallel.
local[k]	Runs Spark locally with k number of threads. (K is ideally the number of cores in your machine).
local[*]	Runs Spark locally with a number of worker threads that equals the number of logical cores in your machine.

With external Spark cluster
Anchor
External Spark Cluster
External Spark Cluster

By default, WSO2 ML runs with an inbuilt Apache Spark instance. However, when working with big data, you can handle those large data sets in a distributed environment through WSO2 ML. You can carry out data pre-processing and model building processes on an Apache Spark cluster to share the workload between the nodes of the cluster. Using a Spark cluster optimizes the performance and reduces the time consumed to build and train a machine learning model for a large data set.

Follow the steps below to run the ML jobs by connecting WSO2 ML to an external Apache Spark cluster.

Info
When following the instructions below you need to use Apache Spark version 1.4.1 with Apache Hadoop version 2.6 and later in the Apache Spark cluster. The Spark deployment pattern can be Standalone, Yarn or Mesos. WSO2 ML is unaware of the underlying configuration of the Spark cluster. It only interacts with the Spark master to which the jobs are submitted.

Press Ctrl+C keys to shutdown the WSO2 ML server. For more information on shutting down WSO2 ML server, see Running the Product .
In the <ML_HOME>/repository/conf/etc/spark-config.xml file, enter the Spark master URL as the value of the < spark.master> property as shown in the example below.
Tip
You can find the Spark Master URL in the Apache Spark Web UI as shown below.
Image Removed
Code Block
language xml
<property name="spark.master">spark://{SPARK_MASTER_URL}:7077</property>
Create a directory named <SPARK_HOME>/ml/ and copy the following jar files into it. These jar files can be found in the <ML_HOME>/repository/components/plugins directory.
- org.wso2.carbon.ml.core_1.0.12.jar
- org.wso2.carbon.ml.commons_1.0.12.jar
- org.wso2.carbon.ml.database_1.0.12.jar
- kryo_2.24.0.wso2v1.jar

Create a file named spark-env.sh in the <SPARK_HOME>/conf/ directory and add the following entries.

Code Block

language	text

SPARK_MASTER_IP={SPARK_MASTER_URL}127.0.0.1
SPARK_CLASSPATH={SPARK_HOME}/ml/org.wso2.carbon.ml.core_1.0.12.jar:{SPARK_HOME}/ml/org.wso2.carbon.ml.commons_1.0.12.jar:{SPARK_HOME}/ml/org.wso2.carbon.ml.database_1.0.12.jar:{SPARK_HOME}/ml/kryo_2.24.0.wso2v1.jar

Restart the external Spark cluster using the following commands:
Code Block
{SPARK_HOME}$ ./sbin/stop-all.sh {SPARK_HOME}$ ./sbin/start-all.sh
In the <ML_HOME>/repository/conf/etc/spark-config.xml file, enter the Spark master URL as the value of the < spark.master> property as shown in the example below.
Tip
You can find the Spark Master URL in the Apache Spark Web UI as shown below.
Image Added
Code Block
language xml
<property name="spark.master">{SPARK_MASTER_URL}</property>
Restart the WSO2 ML server. For more information on restarting WSO2 ML server, see Running the Product.

...

Setup DAS cluster using Carbon clustering. Configure it to have at least one worker node. For more information on setting up a DAS cluster, see Clustering Data Analytics Server.
Install the following ML features in each DAS node from the P2 repository of your ML version. For more information on installing features, see Installing and Managing Features.
- Machine Learner Commons
- Machine Learner Core
- Machine Learner Database Service
Stop all DAS nodes. For more information on stopping DAS nodes, see Running the Product in DAS documentation.
Start DAS cluster again without initializing Spark contexts with CarbonAnalytics and ML features. Use the following options when starting the cluster.
Option Purpose
-DdisableAnalyticsSparkCtx=true To disable CarbonAnalytics Spark context.
-DdisableMLSparkCtx=true To disable ML Spark context.

To configure ML to use DAS as the Spark cluster, set the following property in the <ML_HOME>/repository/conf/etc/spark-config.xml file.

Code Block

language	xml

 <property name="spark.master">{SPARK_MASTER}</property>

Add the jars to Spark executor extra class path.

org.wso2.carbon.ml.commons_1.0.12.jar
org.wso2.carbon.ml.core_1.0.12.jar
org.wso2.carbon.ml.database_1.0.12.jar
spark-mllib_2.10_1.4.1.wso2v1.jar
arpack_combined_0.1.0.wso2v1.jar
breeze_2.10_0.11.1.wso2v1.jar
core_1.1.2.wso2v1.jar
jblas_1.2.3.wso2v1.jar
spire_2.10_0.7.4.wso2v1.jar

These should also be added to the Spark driver extra class path as Spark configuration properties in the <ML_HOME>/repository/conf/etc/spark-config.xml file as shown below.

Code Block

language	xml

<property name="spark.driver.extraClassPath">{ML_HOME}/repository/components/plugins/org.wso2.carbon.ml.commons_1.0.12.jar:{ML_HOME}/repository/components/plugins/org.wso2.carbon.ml.core_1.0.12.jar:{ML_HOME}/repository/components/plugins/org.wso2.carbon.ml.database_1.0.12.jar:{ML_HOME}/repository/components/plugins/spark-mllib_2.10_1.4.1.wso2v1.jar:{ML_HOME}/repository/components/plugins/arpack_combined_0.1.0.wso2v1.jar:{ML_HOME}/repository/components/plugins/breeze_2.10_0.11.1.wso2v1.jar:{ML_HOME}/repository/components/plugins/core_1.1.2.wso2v1.jar:{ML_HOME}/repository/components/plugins/jblas_1.2.3.wso2v1.jar:{ML_HOME}/repository/components/plugins/spire_2.10_0.7.4.wso2v1.jar
</property>

Code Block

language	xml

<property name="spark.executor.extraClassPath">{ML_HOME}/repository/components/plugins/org.wso2.carbon.ml.commons_1.0.12.jar:{ML_HOME}/repository/components/plugins/org.wso2.carbon.ml.core_1.0.12.jar:{ML_HOME}/repository/components/plugins/org.wso2.carbon.ml.database_1.0.12.jar:{ML_HOME}/repository/components/plugins/spark-mllib_2.10_1.4.1.wso2v1.jar:{ML_HOME}/repository/components/plugins/arpack_combined_0.1.0.wso2v1.jar:{ML_HOME}/repository/components/plugins/breeze_2.10_0.11.1.wso2v1.jar:{ML_HOME}/repository/components/plugins/core_1.1.2.wso2v1.jar:{ML_HOME}/repository/components/plugins/jblas_1.2.3.wso2v1.jar:{ML_HOME}/repository/components/plugins/spire_2.10_0.7.4.wso2v1.jar
</property>

Enter values that are less than or equal to the allocated resources for Spark workers in the DAS cluster for the following two properties in the <ML_HOME>/repository/conf/etc/spark-config.xml file. This ensures that the ML does not call for unsatisfiable resources from the DAS Spark cluster.
- spark.executor.memory:
  Code Block
  language xml
  <property name="spark.executor.memory">{memory_in_m/g}</property>
- spark.executor.cores:
  Code Block
  language xml
  <property name="spark.executor.cores">{number_of_cores}</property>
Start the ML server. For more information on starting WSO2 ML server, see Running the Product.

...

Versions Compared

Old Version 10

New Version Current

Key

With external Spark cluster
Anchor
External Spark Cluster
External Spark Cluster

Option	Purpose
-DdisableAnalyticsSparkCtx=true	To disable CarbonAnalytics Spark context.
-DdisableMLSparkCtx=true	To disable ML Spark context.

Page Comparison

Versions Compared

Old Version 10

New Version Current

Key

With external Spark cluster AnchorExternal Spark ClusterExternal Spark Cluster

With external Spark cluster
Anchor
External Spark Cluster
External Spark Cluster