Content Comparison

WSO2 Machine Learner has two main entities namely, Datasets and Projects. Datasets may contain several versions. Once you upload a new dataset, WSO2 ML generates a new dataset version of it. However, if it is not a new dataset, WSO2 ML generates a dataset version out of the uploaded dataset. This process of uploading a dataset to WSO2 ML is shown in the flow diagram below.

data exploration flow diagram

Follow the steps below to geenrate a dataset using the ML UI.

Start the WSO2 ML server. For instructions on starting, see Running the Product.
Access the ML UI from your Web browser using the following URL: https://<ML_HOST>:<ML_PORT>/ml
Info
You can find the URL of the WSO2 ML UI in the server startup logs in the CLI as follows: INFO{org.wso2.carbon.ml.core.internal.MLCoreDS} - WSO2 Machine Learner UI : https://127.0.0.1:9446/ml
Log in to the ML UI as a user who is registered in WSO2 ML. For registering users, see User Management. You view the home page where two blocks are displayed for datasets and projects as shown below.
Image RemovedImage Added
Click Datasets block to navigate to DATASETS page as shown below.
Image RemovedImage Added
Click CREATE DATASET, to create a new dataset.

Enter the following details of the dataset.
Image RemovedImage Added The descriptions of the above fields are as follows.

Field	Description
Dataset Name	A unique name for the dataset.
Version	Version of the dataset.
Description	A description for the dataset.
Source Type	Type of the source where the data is retrieved from. It supports the following options. File - Retrieve data from the local file system. HDFS - Retrieve data from a Hadoop file system (HDFS). For instructions on providing HDFS support for the ML to retrieve data from it to create the dataset, see HDFS Support. DAS - Retrieve data from a WSO2 DAS table. For instructions on integrating WSO2 DAS for the ML to retrieve data from it to create the dataset, see Integration with WSO2 Data Analytics Server.
Data Source	Source to retrieve the dataset file. It supports the options for the available source types as follows. File - file to upload HDFS - source path of HDFS DAS - data table in the Data Access Layer of WSO2 DAS.
Data Format	File type, whether the dataset format is CSV or TSV.
Column Header Available	If headers for columns are available in the CSV or TSV data file.

Once the dataset is successfully created, you view the created dataset listed as follows.
Image RemovedImage Added
Note that the status of the dataset is displayed as Processing.

Click REFRESH in the CREATE DATASET page to refresh the page. The dataset will be displayed with the Processed status as follows.
Image RemovedImage Added

Use the provided options to explore or delete the created dataset, or to create a project from the created dataset.

...

Log in to the WSO2 ML UI, if you are not already logged in.
Click DATASETS in the top menu as shown below.
Image RemovedImage Added
Click on the text which displays the number of versions available on the dataset as shown below.
Image RemovedImage Added
E nter a new version number for the dataset, and click CREATE VERSION as shown below.
Image RemovedImage Added
Enter the following details of the dataset.
Image RemovedImage Added
Click Create Version. Once the new version of the dataset is successfully created, you view it listed as follows. Use the provided options to explore or delete the created version, or to create a project from the created version.
Image RemovedImage Added

Exploring the dataset

...

Log in to the WSO2 ML UI, if you are not already logged in.
Click Datasets button as shown below.
Image RemovedImage Added
Click the EXPLORE button of the dataset which you want to explore as shown below.

You view different perspectives on the dataset through four chart types as follows.
Scatter plot & histogram

Scatter plot visualizes the relationship between the two selected features of the dataset. Moreover, histograms provide the user a graphical representation of the data distribution for the same two features you select. The scatter plot user interface allows you to select two numerical features from the dataset to be visualized through a scatter plot and histograms.
Parallel set

Parallel set is a visualization method used for categorical data. It adopts the layout of parallel coordinates, but substitutes the individual data points by a frequency-based representation. This abstract view is combined with a set of interactions. It supports visual data analysis of large and complex data sets. Using the parallel sets user interface, you can specify which categorical features to draw the diagram.
Trellis chart

Trellis chart is a series of graphs or charts based on the same scale and axes, allowing them to be easily compared. It uses multiple views to show different partitions of a dataset, and is useful for finding the structure and patterns in complex data. Trellis chart user interface allows you to select one categorical feature and multiple numerical features (bound to a maximum) to draw the diagram.

Cluster diagram
Cluster diagram is a general type of diagram, which depicts one or more clusters in a dataset. A cluster in general is a group or collection of discrete points that are close to each other. In explore view, a cluster diagram provides a perspective on data clusters for two selected numerical features. A popular clustering algorithm is applied on the data sample to derive data clusters. You can select two numerical features and the number of clusters required through the cluster diagram user interface.

Version	Old Version 26	New Version 27
Changes made by	Former user	Former user
Saved on	Sept 04, 2015	Sept 04, 2015

Versions Compared

Key

Exploring the dataset

Scatter plot & histogram

Parallel set

Trellis chart

Cluster diagram