FAQ
- 1 General ML configuration related questions
- 1.1 Where can I find the ML configuration file?
- 1.2 How can I change the ML datasource name?
- 1.3 How can I change the sample points size that ML use to generate summary statistics?
- 1.4 How can I change the directory which holds datasets?
- 1.5 How can I change the directory with models?
- 1.6 How can I increase ML thread pool size?
- 1.7 Where should I configure the email addresses of recipients whom will be notified upon a model generation?
- 1.8 How can I change datasets storage to HDFS?
- 1.9 How can I change model storage to HDFS?
- 1.10 How can I give a HDFS URL?
- 2 Data related questions
- 3 Algorithm related questions
- 4 Analysis related questions
- 5 Model related questions
- 5.1 How can I find the details of a built model?
- 5.2 How can I calculate the accuracy for a given model?
- 5.3 Can I download a built model?
- 5.4 Do you support exporting models in PMML format?
- 5.5 Can I use a built model in a Java program?
- 5.6 How can I make predictions for a test dataset using ML UI wizard?
- 5.7 Can I use a built model in other WSO2 products?
- 6 REST API related questions
General ML configuration related questions
The <ML_HOME>/repository/conf/machine-learner.xml file includes all the ML-specific configurations.
The default name is jdbc/WSO2ML_DB. You can change it by changing the value of the <DataSourceName> element in the <ML_HOME>/repository/conf/machine-learner.xml file. For more information, see ML-specific configurations.
Default size is 10000. You can change it by changing the value of the <SampleSize> property within the <SummaryStatisticsSettings> element in the <ML_HOME>/repository/conf/machine-learner.xml file. For more information, see ML-specific configurations.
By default, the <ML_HOME>/datasets/ directory holds datasets, and the default storage type is ‘file’. You can change it by changing the value of the <StorageDirectory> property within the <DatasetStorage> element in the <ML_HOME>/repository/conf/machine-learner.xml file. For more information, see ML-specific configurations.
By default, the <ML_HOME>/models/ directory holds models, and default storage type is ‘file’. You can change it by changing the value of the <StorageDirectory> property within the <ModelStorage> element in the <ML_HOME>/repository/conf/machine-learner.xml file. For more information, see ML-specific configurations.
WSO2 ML uses threads in a thread pool to run different tasks such as dataset summary generation and model generation etc. You can control the size of this thread pool by changing the value of the following property in the <ML_HOME>/repository/conf/machine-learner.xml file: <Property name="ml.thread.pool.size" value="100"/> For more information, see ML-specific configurations.
Where should I configure the email addresses of recipients whom will be notified upon a model generation?
You can configure WSO2 ML to send emails on the completion of a model generation. You can have a comma-separated set of email addresses as the value of the <EmailNotificationEndpoint> property in the <ML_HOME>/repository/conf/machine-learner.xml file. For more information on configuring email support, see Enabling Email Notifications.
If you want to change the dataset storage type to HDFS, change the value of the <StorageType> property within the <DatasetStorage> element to ‘hdfs’. For more information, see ML-specific configurations.
If you want to change the model storage type to HDFS, change the value of the <StorageType> property within the <ModelStorage> element to ‘hdfs’. For more information, see ML-specific configurations.
If you want to store your datasets and models in an HDFS, you need to enter HDFS URL as the value of the <HdfsURL> property in the <ML_HOME>/repository/conf/machine-learner.xml file. For more information, see ML-specific configurations.
Data related questions
WSO2 ML currently supports the following formats.
CSV with comma separated values
TSV with tab separated values
Data can be retrieved from the following sources.
File system
Hadoop distributed file system
WSO2 Data Analytics Server table
It is not mandatory to have a header row. If the dataset does not have a header row, it will be indicated when you upload the dataset. Then WSO2 ML will generate a header similar to V1, V2 .. Vn. For more information see Exploring Data.
Yes, currently it is 100MB. You can change it via the following property in <ML_HOME>/bin/wso2server.bat file (for Windows) or <ML_HOME>/bin/wso2server.sh file (for Linux).
100MB = 100 x 1024 x 1024 = 104857600 Bytes
-Dorg.apache.cxf.io.CachedOutputStream.Threshold=104857600 \
Algorithm related questions
Yes it does. The following algorithms are available in this version.
Linear RegressionRidge RegressionLasso Regression
See How to Select an Algorithm in WSO2 ML for more information
Yes it does. The classification algorithms currently supported are as follows.
Logistic Regression with Stochastic Gradient DescentLogistic Regression with Limited memory Broyden-Fletcher-Goldfarb-ShannoDecision treeRandom forestNaive bayes
See How to Select an Algorithm in WSO2 ML for more information
At present, very primitive support is available for clustering. K-means is the only clustering algorithm supported for this ML version. We have plans to improve on this area. Please contact us if you are someone who is interested to learn more about these plans.
Analysis related questions
It supports feature selection and missing value filling/filtering.
Yes, WSO2 ML supports dataset exploration functionality with multiple visualization techniques. See Exploring Data for more information.
Model related questions
Once you built a model, you can view its model summary, in which you find a summary of the model evaluation. For more information, see Evaluating Models.
For classification type algorithms, generate an accuracy measurement based on the predictions made by the model for the test dataset. Test dataset is extracted from the uploaded dataset and the proportion is configurable for each analyses.
You can download a built model or publish it to WSO2 registry. For more information, see Generating Models.
WSo2 ML 1.0.0 does not support PMML format. This is in the roadmap to be provided in future versions.