WSO2 ML-Specific Configurations

WSO2 Machine Learner maintains a set of product-specific configurations in the <ML_HOME>/repository/conf/machine-learner.xml file. Following are the detailed definitions of the configurations.

Database configurations

The following configuration specifies the datasource which connects the ML to the database in which product-specific data relating to the MB is stored.

<DataSourceName>jdbc/WSO2ML_DB</DataSourceName>

The following table describes the parameters of the database related configuration.

Parameter Name Description Type Default Value

DataSourceName

The datasource which connects the ML to the database in which product-specific data relating to the MB is stored. The default value is the inbuilt H2 database. The configuration of this database can be found in the <ML_HOME>/repository/conf/datasources/ml-datasources.xml file.

Currently WSO2 ML supports H2 and MySQL for its inbuilt database. If you want to change this default database you need to do the following.

Use the scripts in the <ML_HOME>/dbscripts/ directory to create the tables of the database.
Change the properties of the above datasource configuration accordingly.

For similar instructions on changing the default database, see Setting up H2 and Setting up MySQL.

String

jdbc/WSO2ML_DB

Summary statistics settings

When a dataset is created, WSO2 ML calculates summary statistics for the datatset. Following configurations are used by WSO2 ML to calculate summary statistics.

<SummaryStatisticsSettings>
	<HistogramBins>20</HistogramBins>
	<CategoricalThreshold>20</CategoricalThreshold>
	<SampleSize>10000</SampleSize>
</SummaryStatisticsSettings>

The following table describes the parameters of the summary statistics configuration.

Parameter Name	Description	Type	Default Value
`HistogramBins`	The number of intervals generated for continuous variables when plotting histograms.	Integer	20
`CategoricalThreshold`	The cut-off value for the number of unique values in a numerical variable that is used in deciding whether that particular variable is a categorical variable or a continuous variable. Any numerical variable in the dataset having unique values less than or equal to this value, are treated as a categorical variable. Otherwise, it will be treated as a continuous variable.	Integer	80
`SampleSize`	Size of the sample that is used for the summary statistics calculation.	Integer	10000

Input/output handling configurations

Following set of properties define the input/output handling configurations of WSO2 ML.

<Properties>
	<Property name="ml.thread.pool.size" value="100" />
	<Property name="file.in" value="org.wso2.carbon.ml.core.impl.FileInputAdapter" />
	<Property name="file.out" value="org.wso2.carbon.ml.core.impl.FileOutputAdapter" />
	<Property name="hdfs.in" value="org.wso2.carbon.ml.core.impl.HdfsInputAdapter" />
	<Property name="hdfs.out" value="org.wso2.carbon.ml.core.impl.HdfsOutputAdapter" />
	<Property name="das.in" value="org.wso2.carbon.ml.core.impl.BAMInputAdapter" />
	<Property name="registry.in" value="org.wso2.carbon.ml.core.impl.RegistryInputAdapter" />
	<Property name="registry.out" value="org.wso2.carbon.ml.core.impl.RegistryOutputAdapter" />
</Properties>

The following table describes the properties of the input/output handling configuration.

Property Name	Description	Type	Default Value
`ml.thread.pool.size`	The size of the thread pool used by WSO2 ML.	Integer	`100`
`file.in`	The adapter that reads files from the local file system.	String	`org.wso2.carbon.ml.core.impl.FileInputAdapter`
`file.out`	The adapter that writes files to the local file system.	String	`org.wso2.carbon.ml.core.impl.FileOutputAdapter`
`hdfs.in`	The adapter that reads files from a Hadoop File System (HDFS).	String	`org.wso2.carbon.ml.core.impl.HdfsInputAdapter`
`hdfs.out`	The adapter that writes files to a Hadoop File System (HDFS).	String	`org.wso2.carbon.ml.core.impl.HdfsOutputAdapter`
`registry.in`	The adapter that reads data from WSO2 registry.	String	`org.wso2.carbon.ml.core.impl.RegistryInputAdapter`
registry.out	The adapter that writes data into WSO2 registry.	String	`org.wso2.carbon.ml.core.impl.RegistryOutputAdapter`

If you want to add an custom input/output adapter, add the following properties to the above input/output handling configurations:

<Property name="custom.in" value="org.wso2.carbon.ml.custom.adapter.input.CustomMLInputAdapter"/>

<Property name="custom.out" value="org.wso2.carbon.ml.custom.adapter.output.CustomMLOutputAdapter"/>

Storage configurations

This section contains configurations relating to the storage of datasets and models using the storage type file or hdfs. Configurations relating to storage are defined as shown in the example below. This configuration is optional and commented out by default. You can uncomment it and edit the default configurations as required.

<HdfsURL>hdfs://localhost:9000</HdfsURL>
<!-- DatasetStorage> 
	<StorageType>file</StorageType> 
	<StorageDirectory>/tmp</StorageDirectory> 
</DatasetStorage -->

<!-- ModelStorage> 
	<StorageType>file</StorageType> 
	<StorageDirectory>/tmp</StorageDirectory> 
</ModelStorage -->

The following table explains the parameters of the storage configuration.

Parameter Name	Description	Type	Default Value
`HdfsURL`	The HDFS location in which the ML is allowed to store files.	String	hdfs://localhost:9000
DatasetStorage	Location where datasets are stored. By default, the value of this server configuration is the file system. For information on using HDFS as the dataset storage, see HDFS Support, and for information on using custom input/output adapters as the dataset storage, see ML Custom Adapter Extension.	N/A	N/A
ModelStorage	Location where models are persisted. By default, the value of this server configuration is the file system. For information on using HDFS as the model storage, see HDFS Support. For information on using HDFS as the model storage, see HDFS Support, and for information on using custom input/output adapters as the model storage, see ML Custom Adapter Extension.	N/A	N/A
`StorageType`	This parameter specifies whether the relevant artifact should be stored in the file system, HDFS or a storage defined by a custom input/output adapter.	String	If you want to use the file system as the storage type, enter `file` as the value of this parameter. If you want to use HDFS as the storage type, enter `hdfs` as the value of this parameter. If you want to use a storage defined by a custom input/output adapter, as the storage type, enter the prefix (e.g. `custom`) of the custom input/output adapter property name (e.g. `custom.in`) as the value of this parameter.
`StorageDirectory`	The storage directory in which the relevant artifact should be saved.	String	If the storage type is `file`, the artifact is saved in the `<CARBON_HOME>/datasets` or `<CARBON_HOME>/models/` directory by default (i.e. depending on whether your are configuring storage parameters for datasets or models). If the storage type is `hdfs`, the artifact is saved in the directory (which is in the location to which the HDFS URL points). Specify this location as the value of this parameter. If the storage type is a storage defined by a custom input/output adapter, the artifact is saved in the directory which you define as the value of this parameter.

Algorithm configurations

WSO2 ML supports various machine learning algorithms. Configurations of these algorithms are defined as shown in the example below.

<Algorithms>
		<Algorithm>
			<Name>LINEAR_REGRESSION</Name>
			<Type>Numerical_Prediction</Type>
			<Parameters>
				<Name>Iterations</Name>
				<Value>100</Value>
			</Parameters>
			<Parameters>
				<Name>Learning_Rate</Name>
				<Value>0.001</Value>
			</Parameters>
			<Parameters>
				<Name>SGD_Data_Fraction</Name>
				<Value>1</Value>
			</Parameters>
		</Algorithm>
	</Algorithms>

The following table describes the parameters of an algorithm configuration.

Parameter Name	Description	Type
`Name`	The name of the algorithm.	String
`Type`	The type of the algorithm.	String
`Iterations`	The number of iterations of gradient descent to run.	Integer

In the above configurations, the interpretability, scalability, multicollinearity, and dimensionality define a set of weights (on a scale of zero to five), given to each algorithm. These weights are used for calculating ratings for algorithms when recommended algorithms are being requested. It is highly recommended that these values remain unchanged. Each parameter under algorithms represents the hyper-parameters associated with each of the algorithm, and their default values.

Other configurations

Parameter Name Description Type Default Value

EmailNotificationEndpoint This parameter is used to enter a list of comma-separated email addresses to which model building status mails should be sent. This is an optional parameter. String N/A

ModelRegistryLocation

The location in the Governance Registry where ML related models are published.

e.g.,

<ModelRegistryLocation>ml</ModelRegistryLocation>

String

ml