Glossary

Algorithm

An algorithm is the procedure or the formula used by a Machine Learner model to solve a problem (i.e., make a prediction/recommendation, identify an anomaly etc.).

Analysis

An analysis is a logical grouping of a set of machine learning tasks, which holds a pre-processed feature set, a selected machine learning algorithm and its calibrated set of hyper-parameters. A ML project contains one or more ML analyses which are immutable.

Anomaly Detection

Identifying data items that do not confirm to the expected pattern compared to the other items in the dataset.

Classification

The type of analysis that involves grouping specific data items under two or more pre-defined categories.

Dataset

A Dataset is a collection of data organized according to a defined schema in an CSV or TSV format. They can be uploaded to WSO2 Machine Learner from a file system, a Hadoop distributed file system (HDFS) or WSO2 Data Analytics Server (DAS). The Machine Learner analyzes and tests the data in datasets in order to train models to make predictions and recommendations. If you update a dataset used by the Machine Learner by adding new data, removing existing data, and/or modifying existing data, you can continue to use it by uploading the updated dataset as a new dataset version. Multiple projects can be created for each dataset to group different sets of analyses created for different purposes.

Deep Learning

This involves classifying data with a multi-layer feed-forward artificial neural network that is trained with stochastic gradient descent using back-propagation. The network can contain a large number of hidden layers consisting of neurons with tanh, rectifier and maxout activation functions. Subsequent layers learns from activation from previous layers. Each compute node trains a copy of the global model parameters on its local data with multi-threading (asynchronously), and contributes periodically to the global model via model averaging across the network.

The diagram below shows a deep network with four inputs (four features), two hidden layers and two outputs (two classes to be predicted).

Model

A model is an entity that is generated by running a ML analysis on a selected version of a dataset. A model contains the mapping between the selected input variables and the output variable, and can be used to predict for future data.

Numerical Prediction

The type of analysis which involves producing a numerical value as the result.

PMML

The Predictive Model Markup Language (PMML) is an XML-based file format developed by the Data Mining Group to provide a way for applications to describe and exchange models produced by data mining and machine learning algorithms. It supports common models such as logistic regression and feedforward neural networks.

Since PMML is an XML-based standard, the specification comes in the form of an XML schema.

Product Variable

The feature that represents the list of products that needs to be recommended. This is used in recommendation algorithms.

Project

Project is a logical grouping of machine learning analyses, which are performed on a dataset. To analyze multiple datasets, you need to create multiple projects. A project is bound to a dataset not to a dataset version.

Publish

This refers to sharing an ML model with another WSO2 server by publishing it in the registry.

Rating Variable

The list of ratings that were given to products by the users. This is used in recommendation algorithms.

Response Variable

The variable for which the prediction is made by a Machine Learner model based on the other variables taken into account.

Train Data Fraction

The fraction of the dataset used for the purpose of training a specific model. The remaining fraction of data is used to evaluate the performance of the trained model.

User Variable

The feature that represents the list of users who are being considered for recommending products. This is used in recommendation algorithms.