Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
maxLevel3

...

  1. Start the Apache Hadoop server by executing the following command: {HADOOP_HOME}/sbin/start-all.sh

  2. Access the Hadoop UI using the following URL:  http://localhost:50070

  3. Upload your dataset file to HDFS. 

    Tip

    You can use the HDFS Writer utility tool for this.

  4. Start the WSO2 ML server. For instructions on starting, see Running the Product.

  5. Log in to the WSO2 ML UI using the following URL: https://<ML_HOST>:<ML_PORT>/ml

  6. Click the Datasets button as shown below.

    click Datasets button

  7. Click ADD DATASET button in the top menu. 
  8. Enter the following details as shown below.

    • Enter a  dataset name for Dataset Name.
    • Enter the version number for Version.
    • Select HDFS fro the Source Type.
    • Enter the HDFS URL of the dataset file for Data Source
    • Select the Data Format
    • Select Yes for Column header available, if you get column headers defined in the dataset.

  9. Click Create Dataset. You view the new dataset added to the list of all available datasets on successful dataset creation.

...

Dataset storage is a WSO2 Machine Learner server configuration. By default it is uses the file system. For instructions on changing the dataset storage to HDFS, see Storage ConfigurationsTo upload your dataset files to HDFS, select HDFS for Destination in the WSO2 ML UI, when you create the dataset.

Spark read datasets from HDFS

...