Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

WSO2 Machine Learner has two main entities namely, Datasets and Projects. Datasets may contain several versions. Once you upload a new dataset, WSO2 ML generates a new dataset version of it. However, if it is not a new dataset, WSO2 ML generates a dataset version out of the uploaded dataset. This process of uploading a dataset to WSO2 ML is shown in the flow diagram below.

data exploration flow diagram

Follow the steps below to geenrate a dataset using the ML UI.

  1.  Start the WSO2 ML server. For instructions on starting, see Running the Product.
  2. Access the ML UI from your Web browser using the following URL: https://<ML_HOST>:<ML_PORT>/ml

    Info

    You can find the URL of the WSO2 ML UI in the server startup logs in the CLI as follows: INFO{org.wso2.carbon.ml.core.internal.MLCoreDS} -  WSO2 Machine Learner UI : https://127.0.0.1:9443/ml

  3. Log in to the ML UI as a user who is registered in WSO2 ML. For registering users, see User Management. You view the home page where two blocks are displayed for datasets and projects as shown below.
  4. Click Datasets block to navigate to DATASETS page as shown below.
  5. Click CREATE DATASET, to create a new dataset. 
  6. Enter the following details of the dataset.
     The descriptions of the above fields are as follows.

    FieldDescription
    Dataset Name

    A unique name for the dataset.

    Version Version of the dataset.
    Description

    A description for the dataset.

    Source Type

    Type of the source where the data is retrieved from. It supports the following options.

    • File - Retrieve data from the local file system.
    • HDFS - Retrieve data from a Hadoop file system (HDFS). For instructions on providing HDFS support for the ML to retrieve data from it to create the dataset, see HDFS Support.
    • DAS - Retrieve data from a WSO2 DAS table. For instructions on integrating WSO2 DAS for the ML to retrieve data from it to create the dataset, see Integration with WSO2 Data Analytics Server.
    Data Source

    Source to retrieve the dataset file. It supports the options for the available source types as follows.

    • File - file to upload
    • HDFS - source path of HDFS
    • DAS - data table in the Data Access Layer of WSO2 DAS.

      Info

      Default limit for dataset file size is 100 MB. You can increase (or decrease) this limit by changing the Java option -Dorg.apache.cxf.io.CachedOutputStream.Threshold value in <ML_HOME>/bin/wso2server.sh file.

      If you get an error like below when you try to upload a dataset, it means the size of the dataset you are trying to update is larger than the current upload limit.

      Image Added

    Data FormatFile type, whether the dataset format is CSV or TSV.
    Column Header Available

    If headers for columns are available in the CSV or TSV data file.

    Once the dataset is successfully created, you view the created dataset listed as follows.

    Note that the status of the dataset is displayed as Processing

  7. Click REFRESH in the CREATE DATASET page to refresh the page. The dataset will be displayed with the Processed status as follows.

...