Figure 2: Data After SMOTE-ENN Application
Fraud Model Performance Measures
The results were analyzed using the standard confusion matrix. A
confusion matrix is a table that helps calculate the accuracy of a
classification model and the precision, recall, and f1-score. The table
is made up of true positive (TP), false positive (FP), false negative
(FN), and true negative (TN) values 15. For a 2x2
binary classification, the confusion matrix as it pertains to this
project can be deciphered as follows:
True Positive (TP): The algorithm predicts the fraud, and the
outcome is fraud.
True Negative (TN): The algorithm predicts no fraud and there
was no fraud.
False Positive (FP): The algorithm predicted fraud, but there
was no fraud (Type 1 Error).
False Negative (FN): The algorithm predicted no-fraud, but
there was fraud (Type II Error)
Table 2 presents the performance measures used in the models for this
paper. Accuracy is the proportion of correct predictions over all
predictions. However, accuracy is not the best metric for an imbalanced
dataset. An improved single measure is the Matthews Correlation
Coefficient (MCC) 50. The MCC is a measure of the
quality of binary classification. The MCC considers true and false
positives and negatives and is widely regarded as a balanced measure
that can be applied even when the classes are of very different sizes51. The MCC is in the range [-1, 1]. A coefficient
of +1 represents a perfect prediction, a coefficient of 0 represents an
average random prediction, and a coefficient of -1 represents an inverse
prediction 52. The MCC has some valuable properties
that make it more suitable than other measures for some purposes, most
notably its ability to work well even when one of the two classes is
much more frequent than the other 50,51. The formula
for both the performance accuracy and the MCC is shown in Table 2.
Practitioner Measures
To understand how well a classification model is performing, we need to
look beyond the performance accuracy. The accuracy may be high, but this
could be due to the model predicting the most frequent class all the
time, which produces inconsistent and unreliable results for an
imbalanced dataset. To get a better idea of model performance, the
confusion matrix can be used to calculate the performance of other
metrics. From the confusion matrix, we can calculate the precision,
recall, and F1-score. Precision is the number of correct predictions
divided by the total number of predictions. The recall is the number of
correct predictions divided by the total number of actual positive
cases. Generally, a classifier with a higher precision but lower recall
will miss some fraudulent items but will not incorrectly predict too
many items as fraud. A classifier with a higher recall but lower
precision will correctly identify more of the fraud items and
incorrectly predict more items as being a fraud. The ideal classifier
would have perfect precision and recall, but this is usually impossible
in practice. Instead, the goal is usually to find a balance between
precision and recall that gives the best overall results. The F-1 score
is the harmonic mean of precision and recall that achieves this
objective. A good classification model will have a high precision,
recall, and F1 score 19.
A more robust measure is the Receiver Operating Characteristics (ROC).
The ROC curve is a graphical tool used to evaluate the performance of a
binary classifier. The curve is generated by plotting the true positive
rate (TPR) against the false positive rate (FPR) at various threshold
values. The area under the ROC curve (AUROC) is a metric that can be
used to compare different classifiers. A classifier with a higher AUC
will have better discrimination, meaning it can better distinguish
between positive and negative observations. A perfect classifier would
have a TPR of 1 and an FPR of 0, resulting in a point in the upper left
corner of the ROC curve. Generally, the closer the ROC curve is to this
corner, the better the classifier performs. Classifiers that perform
similarly to random guessing will have a ROC curve close to the diagonal
line 18,19.
Table 2: Performance Metrics and Formulae