Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The following deployment patterns are supported for WSO2 ML.

...

By default, WSO2 ML runs with an inbuilt Apache Spark instance. However, when working with big data, you can handle those large data sets in a distributed environment through WSO2 ML. You can carry out data pre-processing and model building processes on an Apache Spark cluster to share the workload between the nodes of the cluster. Using a Spark cluster optimizes the performance and reduces the time consumed to build and train a machine learning model for a large data set.

Follow the steps below to run the ML jobs by connecting WSO2 ML to an external Apache Spark cluster.

...

  1. Setup DAS cluster using Carbon clustering. Configure it to have at least one worker node. For more information on setting up a DAS cluster, see Clustering Data Analytics Server.

  2. Install the following ML features in each DAS node. For more information on installing features, see Installing and Managing Features

    • Machine Learner Commons

    • Machine Learner Core

    • Machine Learner Database Service

  3. Stop all DAS nodes. For more information on stopping DAS nodes, see Running the Product in DAS documentation.
  4. Start DAS cluster again without initializing Spark contexts with CarbonAnalytics and ML features. Use the following options when starting the cluster.

    OptionPurpose
    -DdisableAnalyticsSparkCtx=trueTo disable CarbonAnalytics Spark context.
    -DdisableMLSparkCtx=trueTo disable ML Spark context.
  5. To configure ML to use DAS as the Spark cluster, set the following property in the <ML_HOME>/repository/conf/etc/spark-config.xml file.

    Code Block
    languagexml
     <property name="spark.master">{SPARK_MASTER}</property>


    Add the jars to Spark executor extra class path.

    • org.wso2.carbon.ml.commons_1.0.1.SNAPSHOT.jar
    • org.wso2.carbon.ml.core_1.0.1.SNAPSHOT.jar
    • org.wso2.carbon.ml.database_1.0.1.SNAPSHOT.jar
    • spark-mllib_2.10_1.4.1.wso2v1.jar
    • arpack_combined_0.1.0.wso2v1.jar
    • breeze_2.10_0.11.1.wso2v1.jar
    • core_1.1.2.wso2v1.jar
    • jblas_1.2.3.wso2v1.jar
    • spire_2.10_0.7.4.wso2v1.jar 


    These should also be added to the Spark driver extra class path as Spark configuration properties in the <ML_HOME>/repository/conf/etc/spark-config.xml file as shown below.

    Code Block
    languagexml
    <property name="spark.driver.extraClassPath">{ML_HOME}/repository/components/plugins/org.wso2.carbon.ml.commons_1.0.1.SNAPSHOT.jar:{ML_HOME}/repository/components/plugins/org.wso2.carbon.ml.core_1.0.1.SNAPSHOT.jar:{ML_HOME}/repository/components/plugins/org.wso2.carbon.ml.database_1.0.1.SNAPSHOT.jar:{ML_HOME}/repository/components/plugins/spark-mllib_2.10_1.4.1.wso2v1.jar:{ML_HOME}/repository/components/plugins/arpack_combined_0.1.0.wso2v1.jar:{ML_HOME}/repository/components/plugins/breeze_2.10_0.11.1.wso2v1.jar:{ML_HOME}/repository/components/plugins/core_1.1.2.wso2v1.jar:{ML_HOME}/repository/components/plugins/jblas_1.2.3.wso2v1.jar:{ML_HOME}/repository/components/plugins/spire_2.10_0.7.4.wso2v1.jar
    </property>
    Code Block
    languagexml
    <property name="spark.executor.extraClassPath">{ML_HOME}/repository/components/plugins/org.wso2.carbon.ml.commons_1.0.1.SNAPSHOT.jar:{ML_HOME}/repository/components/plugins/org.wso2.carbon.ml.core_1.0.1.SNAPSHOT.jar:{ML_HOME}/repository/components/plugins/org.wso2.carbon.ml.database_1.0.1.SNAPSHOT.jar:{ML_HOME}/repository/components/plugins/spark-mllib_2.10_1.4.1.wso2v1.jar:{ML_HOME}/repository/components/plugins/arpack_combined_0.1.0.wso2v1.jar:{ML_HOME}/repository/components/plugins/breeze_2.10_0.11.1.wso2v1.jar:{ML_HOME}/repository/components/plugins/core_1.1.2.wso2v1.jar:{ML_HOME}/repository/components/plugins/jblas_1.2.3.wso2v1.jar:{ML_HOME}/repository/components/plugins/spire_2.10_0.7.4.wso2v1.jar
    </property>
  6. Enter values that are less than or equal to the allocated resources for Spark workers in the DAS cluster for the following two properties in the <ML_HOME>/repository/conf/etc/spark-config.xml file. This ensures that the ML does not call for unsatisfiable resources from the DAS Spark cluster.
    • spark.executor.memory

      Code Block
      languagexml
      <property name="spark.executor.memory">{memory_in_m/g}</property>

       

    • spark.executor.cores:

      Code Block
      languagexml
      <property name="spark.executor.cores">{number_of_cores}</property>

       

  7. Start the ML server. For more information on starting WSO2 ML server, see Running the Product.

...