Model Evaluation Measures

This section explains how to evaluate the models generated by WSO2 ML with regard to their accuracy. The following topics are covered.

Available measure types

The model evaluation methods in WSO2 ML can be categorized into four types as follows.

Numerical predictions

These methods involve making a numerical prediction based on the dataset analysed. The available measures of this type are as follows.

Binary classification

Binary classification involves classifying the data items in a dataset into two categories.

Terminology of Binary Classification Metrics

Binary Classification Metrics refer to the following two formulas used to calculated the reliability of a binary classification model.

Name	Formula
True Positive Rate (Sensitivity)	TPR = TP / P = TP / (TP + FN)
True Negative Rate (Specificity)	SPC = TN / N = TN / (TN + FP)

The following table explains the abbreviations used in the above formulas.

Abbreviation	Expansion	Meaning
P	Positives	The total number of positive outcomes (i.e. the total number of items that actually belong to the positive class).
N	Negatives	The total number of negative items (i.e. the total number of items that actually belong to the negative class).
TP	True Positive	TP data items: actually belong to the positive class are correctly included in the positive class
FP	False Positive	FP data items: actually belong to the negative class are incorrectly included in the positive class
TN	True Negative	TN data items: actually belong to the negative class are correctly included in the negative class
FN	False Negative	FN data items: actually belong to the positive class are incorrectly included in the negative class

The available measures of this type are as follows.

Multi-class classification

Multi-class classification involves classifying the items in a dataset into multiple categories. The available measures of this type are as follows.

Clustering

This involves clustering the items in a dataset.

Model evaluation measures

The following methods are used to evaluate the performance of models in terms of accuracy.

Confusion Matrix

This method is available for binary classification and multi class classification models.

The confusion matrix is a table layout that visualises the performance of a classification model by displaying the actual and predicted points in the relevant grid.

The confusion matrix for a binary classification is as follows:

The confusion matrix for a multi class (n number of classifications) classification is as follows:

This matrix allows you to identify which points are correctly classified and which points are incorrectly classified. The points in the grids with matching actual and predicted classes are the correct predictions and these points should be maximised for greater accuracy. The green grids in the above images are the grids for the correctly classified points. In an ideal scenario, all other grids should have zero points.

The following is an example of a confusion matrix with both correctly classified points as well as incorrectly classified points.

Accuracy

This method is available for binary classification and multi class classification models.

The accuracy of a model can be calculated using the following formula.

Accuracy = Correctly Classified Points / Total Number of Points

For a binary classification model, this can be calculated as follows

Accuracy = (TP + TN) / (TP + TN + FP + FN) = (TP + TN) / (P + N)

e.g., The rate of accuracy can be calculated as follows based on the example for the Confusion Matrix above.

Correctly classified points = 12 + 16 + 16 = 44

Total number of points = 12 + 16 + 16 + 1 + 1 + 1 = 47

Accuracy = 44 / 47 = 93.62%

You can find this metric for classification models in the model summary as shown in the above image.

ROC curve

This method is available for binary classification models.

This illustrates the performance of binary classifier model by showing the TPR (True Positive Rate> against the SPC (False Positive Rate) for different threshold values. A completely accurate model would pass through the 0, 1 coordinate (i.e. TPR of 1 and SPC of 0) in the upper left corner of the plot. However, this is not achievable in practical scenarios. Therefore, when comparing models, the model with the ROC curve closest to the 0, 1 coordinate can be considered the best performing model in terms of accuracy. The best threshold for this model is the threshold associated with the point that is closest to the 0,1 coordinate on the ROC curve. You can find ROC curve for a particular binary classification model under the model summary in WSO2 ML UI.

AUC

This method is available for binary classification models.

AUC (Area Under the Curve) is another metric for accuracy of a binary classification model that is associated with the ROC curve. A model with greater accuracy should have a value closer to 1 for AUC (area of the ROC curve). Therefore, when comparing the accuracy of multiple models using the AUC, the one with the highest AUC can be considered the best performing model.

You can find the AUC value for a particular model in its ROC curve in the model summary (see the image of the ROC curve in the previous section with text ROC Curve (AUC = 0.619).

Feature Importance

This method is available for binary classification and numerical prediction models.

This chart visualizes the importance (weight) of each feature in the model according to its significance in creating the final model. In regression models (numerical predictions), each of these weights represents the amount by which the response variable would be changed, when the respective predictor variable is increased by one unit. By looking at this chart you can make decisions for feature selection. This chart type is available for both binary classification and numerical prediction models.

Predicted vs Actual

This method is available for binary classification and multi class classification models.

This chart plots the data points according to the correctness of the classification. You can select two dataset features to be visualized and the plot will display data distribution with the classification accuracy (correct/incorrect) for each point.

MSE

This method is available for numerical prediction models.

MSE (Mean Squared Error) is the average of the squared errors of the prediction. An error is the difference between the actual value and the predicted value. Therefore, a better performing model should have a comparatively lower MSE. This metric is widely used to evaluate the accuracy of numerical prediction models. You can find this metric for numerical prediction models in the model summary as shown in the above image.

Residual Plot

This method is available for numerical prediction models.

Residual plot shows the residuals on the y-axis and a predictor variable (feature) on the x-axis. A residual is defined as the difference between the observed (actual) value and the predicted value of the response variable. A model can be considered accurate then the residual points are:

Randomly distributed (do not form a pattern)
Centered around zero on the vertical axis (indicating that, there are equal numbers of positive and negative values)
Precisely distributed around zero in the vertical axis (indicating that there are no very large positive or negative residual values)

If the above conditions are not satisfied, it is possible that there are some missing/hidden factors/predictor variables that have not been taken into account. Residual plot is available for numerical prediction models. You can select a dataset feature to be plotted with its residuals.