WSO2 Machine Learner maintains a set of product-specific configurations in the <ML_HOME>/repository/conf/machine-learner.xml
file. Following are the detailed definitions of the configurations.
Database configurations
The following configuration specifies the datasource which connects the ML to the database in which product-specific data relating to the MB is stored.
<DataSourceName>jdbc/WSO2ML_DB</DataSourceName>
The following table describes the parameters of the database related configuration.
Parameter Name | Description | Type | Default Value |
---|---|---|---|
DataSourceName | The datasource which connects the ML to the database in which product-specific data relating to the MB is stored. The default value is the inbuilt H2 database. The configuration of this database can be found in the
Currently WSO2 ML supports H2 and MySQL for its inbuilt database. If you want to change this default database you need to do the following.
For similar instructions on changing the default database, see Setting up H2 and Setting up MySQL. | String | jdbc/WSO2ML_DB |
Summary statistics settings
When a dataset is created, WSO2 ML calculates summary statistics for the datatset. Following configurations are used by WSO2 ML to calculate summary statistics.
<SummaryStatisticsSettings> <HistogramBins>20</HistogramBins> <CategoricalThreshold>20</CategoricalThreshold> <SampleSize>10000</SampleSize> </SummaryStatisticsSettings>
The following table describes the parameters of the summary statistics configuration.
Parameter Name | Description | Type | Default Value |
---|---|---|---|
| The number of intervals generated for continuous variables when plotting histograms. | Integer | 20 |
| The cut-off value for the number of unique values in a numerical variable that is used in deciding whether that particular variable is a categorical variable or a continuous variable. Any numerical variable in the dataset having unique values less than or equal to this value, are treated as a categorical variable. Otherwise, it will be treated as a continuous variable. | Integer | 80 |
| Size of the sample that is used for the summary statistics calculation. | Integer | 10000 |
Input/output handling configurations
Following set of properties define the input/output handling configurations of WSO2 ML.
<Properties> <Property name="ml.thread.pool.size" value="100" /> <Property name="file.in" value="org.wso2.carbon.ml.core.impl.FileInputAdapter" /> <Property name="file.out" value="org.wso2.carbon.ml.core.impl.FileOutputAdapter" /> <Property name="hdfs.in" value="org.wso2.carbon.ml.core.impl.HdfsInputAdapter" /> <Property name="hdfs.out" value="org.wso2.carbon.ml.core.impl.HdfsOutputAdapter" /> <Property name="das.in" value="org.wso2.carbon.ml.core.impl.BAMInputAdapter" /> <Property name="registry.in" value="org.wso2.carbon.ml.core.impl.RegistryInputAdapter" /> <Property name="registry.out" value="org.wso2.carbon.ml.core.impl.RegistryOutputAdapter" /> </Properties>
The following table describes the properties of the input/output handling configuration.
Property Name | Description | Type | Default Value |
---|---|---|---|
| The size of the thread pool used by WSO2 ML. | Integer | 100 |
file.in | The adapter that reads files from the local file system. | String | org.wso2.carbon.ml.core.impl.FileInputAdapter |
file.out | The adapter that writes files to the local file system. | String | org.wso2.carbon.ml.core.impl.FileOutputAdapter |
hdfs.in | The adapter that reads files from a Hadoop File System (HDFS). | String | org.wso2.carbon.ml.core.impl.HdfsInputAdapter |
hdfs.out | The adapter that writes files to a Hadoop File System (HDFS). | String | org.wso2.carbon.ml.core.impl.HdfsOutputAdapter |
registry.in | The adapter that reads data from WSO2 registry. | String | org.wso2.carbon.ml.core.impl.RegistryInputAdapter |
registry.out | The adapter that writes data into WSO2 registry. | String | org.wso2.carbon.ml.core.impl.RegistryOutputAdapter |
<Property name="custom.in" value="org.wso2.carbon.ml.custom.adapter.input.CustomMLInputAdapter"/>
<Property name="custom.out" value="org.wso2.carbon.ml.custom.adapter.output.CustomMLOutputAdapter"/>
Storage configurations
This section contains configurations relating to the storage of datasets and models using the storage type file or hdfs. Configurations relating to storage are defined as shown in the example below. This configuration is optional and commented out by default. You can uncomment it and edit the default configurations as required.
<HdfsURL>hdfs://localhost:9000</HdfsURL> <!-- DatasetStorage> <StorageType>file</StorageType> <StorageDirectory>/tmp</StorageDirectory> </DatasetStorage --> <!-- ModelStorage> <StorageType>file</StorageType> <StorageDirectory>/tmp</StorageDirectory> </ModelStorage -->
The following table explains the parameters of the storage configuration.
Parameter Name | Description | Type | Default Value |
---|---|---|---|
HdfsURL | The HDFS location in which the ML is allowed to store files. | String | hdfs://localhost:9000 |
DatasetStorage | Location where datasets are stored. By default, the value of this server configuration is the file system. For information on using HDFS as the dataset storage, see HDFS Support, and for information on using custom input/output adapters as the dataset storage, see ML Custom Adapter Extension. | N/A | N/A |
ModelStorage | Location where models are persisted. By default, the value of this server configuration is the file system. For information on using HDFS as the model storage, see HDFS Support. For information on using HDFS as the model storage, see HDFS Support, and for information on using custom input/output adapters as the model storage, see ML Custom Adapter Extension. | N/A | N/A |
StorageType | This parameter specifies whether the relevant artifact should be stored in the file system, HDFS or a storage defined by a custom input/output adapter. | String |
|
StorageDirectory | The storage directory in which the relevant artifact should be saved. | String |
|
Algorithm configurations
WSO2 ML supports various machine learning algorithms. Configurations of these algorithms are defined as shown in the example below.
<Algorithms> <Algorithm> <Name>LINEAR_REGRESSION</Name> <Type>Numerical_Prediction</Type> <Parameters> <Name>Iterations</Name> <Value>100</Value> </Parameters> <Parameters> <Name>Learning_Rate</Name> <Value>0.001</Value> </Parameters> <Parameters> <Name>SGD_Data_Fraction</Name> <Value>1</Value> </Parameters> </Algorithm> </Algorithms>
The following table describes the parameters of an algorithm configuration.
Parameter Name | Description | Type |
---|---|---|
Name | The name of the algorithm. | String |
Type | The type of the algorithm. | String |
Iterations | The number of iterations of gradient descent to run. | Integer |
In the above configurations, the interpretability
, scalability
, multicollinearity
, and dimensionality
define a set of weights (on a scale of zero to five), given to each algorithm. These weights are used for calculating ratings for algorithms when recommended algorithms are being requested. It is highly recommended that these values remain unchanged. Each parameter under algorithms represents the hyper-parameters associated with each of the algorithm, and their default values.
Other configurations
Parameter Name | Description | Type | Default Value |
---|---|---|---|
EmailNotificationEndpoint | This parameter is used to enter a list of comma-separated email addresses to which model building status mails should be sent. This is an optional parameter. | String | N/A |
ModelRegistryLocation | The location in the Governance Registry where ML related models are published. e.g., <ModelRegistryLocation>ml</ModelRegistryLocation> | String | ml |