WSO2 ML-Specific Configurations
com.atlassian.confluence.content.render.xhtml.migration.exceptions.UnknownMacroMigrationException: The macro 'next_previous_links' is unknown.

WSO2 ML-Specific Configurations

WSO2 Machine Learner maintains a set of product-specific configurations in the <ML_HOME>/repository/conf/machine-learner.xml file. Following are the detailed definitions of the configurations.



Database configurations

The following configuration specifies the datasource which connects the ML to the database in which product-specific data relating to the MB is stored. 

<DataSourceName>jdbc/WSO2ML_DB</DataSourceName>

The following table describes the parameters of the database related configuration.

Parameter Name

Description

Type

Default Value

Parameter Name

Description

Type

Default Value

DataSourceName

The datasource which connects the ML to the database in which product-specific data relating to the MB is stored. The default value is the inbuilt H2 database. The configuration of this database can be found in the <ML_HOME>/repository/conf/datasources/ml-datasources.xml file.

 

Currently WSO2 ML supports H2 and MySQL for its inbuilt database. If you want to change this default database you need to do the following.

  • Use the scripts in the <ML_HOME>/dbscripts/ directory to create the tables of the database.

  • Change the properties of the above datasource configuration accordingly.

For similar instructions on changing the default database, see Setting up H2 and Setting up MySQL.



String

jdbc/WSO2ML_DB

Summary statistics settings 

When a dataset is created, WSO2 ML calculates summary statistics for the datatset. Following configurations are used by WSO2 ML to calculate summary statistics.

<SummaryStatisticsSettings> <HistogramBins>20</HistogramBins> <CategoricalThreshold>20</CategoricalThreshold> <SampleSize>10000</SampleSize> </SummaryStatisticsSettings>

The following table describes the parameters of the summary statistics configuration.

Parameter Name

Description

Type

Default Value

Parameter Name

Description

Type

Default Value

HistogramBins

The number of intervals generated for continuous variables when plotting histograms.

Integer

20

CategoricalThreshold

The cut-off value for the number of unique values in a numerical variable that is used in deciding whether that particular variable is a categorical variable or a continuous variable. Any numerical variable in the dataset having unique values less than or equal to this value, are treated as a categorical variable. Otherwise, it will be treated as a continuous variable.

Integer

80

SampleSize

Size of the sample that is used for the summary statistics calculation.

Integer

10000

Input/output handling configurations 

Following set of properties define the input/output handling configurations of WSO2 ML.

<Properties> <Property name="ml.thread.pool.size" value="100" /> <Property name="file.in" value="org.wso2.carbon.ml.core.impl.FileInputAdapter" /> <Property name="file.out" value="org.wso2.carbon.ml.core.impl.FileOutputAdapter" /> <Property name="hdfs.in" value="org.wso2.carbon.ml.core.impl.HdfsInputAdapter" /> <Property name="hdfs.out" value="org.wso2.carbon.ml.core.impl.HdfsOutputAdapter" /> <Property name="das.in" value="org.wso2.carbon.ml.core.impl.BAMInputAdapter" /> <Property name="registry.in" value="org.wso2.carbon.ml.core.impl.RegistryInputAdapter" /> <Property name="registry.out" value="org.wso2.carbon.ml.core.impl.RegistryOutputAdapter" /> </Properties>

The following table describes the properties of the input/output handling configuration.

Property Name

Description

Type

Default Value

Property Name

Description

Type

Default Value

ml.thread.pool.size

The size of the thread pool used by WSO2 ML.

Integer

100

file.in

The adapter that reads files from the local file system.

String

org.wso2.carbon.ml.core.impl.FileInputAdapter

file.out

The adapter that writes files to the local file system.

String

org.wso2.carbon.ml.core.impl.FileOutputAdapter

hdfs.in

The adapter that reads files from a Hadoop File System (HDFS).

String

org.wso2.carbon.ml.core.impl.HdfsInputAdapter

hdfs.out

The adapter that writes files to a Hadoop File System (HDFS).

String

org.wso2.carbon.ml.core.impl.HdfsOutputAdapter

registry.in

The adapter that reads data from WSO2 registry.

String

org.wso2.carbon.ml.core.impl.RegistryInputAdapter

registry.out

The adapter that writes data into WSO2 registry.

String

org.wso2.carbon.ml.core.impl.RegistryOutputAdapter 

If you want to add an custom input/output adapter, add the following properties to the above input/output handling configurations:

<Property name="custom.in" value="org.wso2.carbon.ml.custom.adapter.input.CustomMLInputAdapter"/>

<Property name="custom.out" value="org.wso2.carbon.ml.custom.adapter.output.CustomMLOutputAdapter"/>

Storage configurations

This section contains configurations relating to the storage of datasets and models using the storage type file or hdfs. Configurations relating to storage are defined as shown in the example below. This configuration is optional and commented out by default. You can uncomment it and edit the default configurations as required.

<HdfsURL>hdfs://localhost:9000</HdfsURL> <!-- DatasetStorage> <StorageType>file</StorageType> <StorageDirectory>/tmp</StorageDirectory> </DatasetStorage --> <!-- ModelStorage> <StorageType>file</StorageType> <StorageDirectory>/tmp</StorageDirectory> </ModelStorage -->

The following table explains the parameters of the storage configuration.

Parameter Name

Description

Type

Default Value

Parameter Name

Description

Type

Default Value

HdfsURL

The HDFS location in which the ML is allowed to store files.

String

hdfs://localhost:9000

DatasetStorage

Location where datasets are stored. By default, the value of this server configuration is the file system. For information on using HDFS as the dataset storage, see HDFS Support, and for information on using custom input/output adapters as the dataset storage, see ML Custom Adapter Extension.

N/A

N/A

ModelStorage

Location where models are persisted. By default, the value of this server configuration is the file system. For information on using HDFS as the model storage, see HDFS Support. For information on using HDFS as the model storage, see HDFS Support, and for information on using custom input/output adapters as the model storage, see ML Custom Adapter Extension.

N/A

N/A

StorageType

This parameter specifies whether the relevant artifact should be stored in the file system, HDFS or a storage defined by a custom input/output adapter.

String

  • If you want to use the file system as the storage type, enter file as the value of this parameter.

  • If you want to use HDFS as the storage type, enter hdfs as the value of this parameter.

  • If you want to use a storage defined by a custom input/output adapter, as the storage type, enter the prefix (e.g. custom) of the custom input/output adapter property name (e.g. custom.in) as the value of this parameter.

StorageDirectory

The storage directory in which the relevant artifact should be saved.

String

  • If the storage type is file, the artifact is saved in the <CARBON_HOME>/datasets or <CARBON_HOME>/models/ directory by default (i.e. depending on whether your are configuring storage parameters for datasets or models).

  • If the storage type is hdfs, the artifact is saved in the directory (which is in the location to which the HDFS URL points). Specify this location as the value of this parameter.

  • If the storage type is a storage defined by a custom input/output adapter, the artifact is saved in the directory which you define as the value of this parameter.

Algorithm configurations

WSO2 ML supports various machine learning algorithms. Configurations of these algorithms are defined as shown in the example below.

<Algorithms> <Algorithm> <Name>LINEAR_REGRESSION</Name> <Type>Numerical_Prediction</Type> <Parameters> <Name>Iterations</Name> <Value>100</Value> </Parameters> <Parameters> <Name>Learning_Rate</Name> <Value>0.001</Value> </Parameters> <Parameters> <Name>SGD_Data_Fraction</Name> <Value>1</Value> </Parameters> </Algorithm> </Algorithms>

The following table describes the parameters of an algorithm configuration.

Parameter Name

Description

Type

Parameter Name

Description

Type

Name

The name of the algorithm.

String

Type

The type of the algorithm.

String

Iterations

The number of iterations of gradient descent to run.

Integer

In the above configurations, the interpretability, scalability, multicollinearity, and dimensionality define a set of weights (on a scale of zero to five), given to each algorithm. These weights are used for calculating ratings for algorithms when recommended algorithms are being requested. It is highly recommended that these values remain unchanged. Each parameter under algorithms represents the hyper-parameters associated with each of the algorithm, and their default values. 

Other configurations

Parameter Name

Description

Type

Default Value

Parameter Name

Description

Type

Default Value

EmailNotificationEndpoint

This parameter is used to enter a list of comma-separated email addresses to which model building status mails should be sent. This is an optional parameter.

String

N/A

ModelRegistryLocation

The location in the Governance Registry where ML related models are published.

e.g.,

<ModelRegistryLocation>ml</ModelRegistryLocation>

String

ml

com.atlassian.confluence.content.render.xhtml.migration.exceptions.UnknownMacroMigrationException: The macro 'next_previous_links2' is unknown.