Figure 2: Data After SMOTE-ENN Application
Fraud Model Performance Measures
The results were analyzed using the standard confusion matrix. A confusion matrix is a table that helps calculate the accuracy of a classification model and the precision, recall, and f1-score. The table is made up of true positive (TP), false positive (FP), false negative (FN), and true negative (TN) values 15. For a 2x2 binary classification, the confusion matrix as it pertains to this project can be deciphered as follows:
True Positive (TP):  The algorithm predicts the fraud, and the outcome is fraud.
True Negative (TN): The algorithm predicts no fraud and there was no fraud.
False Positive (FP):  The algorithm predicted fraud, but there was no fraud (Type 1 Error).
False Negative (FN): The algorithm predicted no-fraud, but there was fraud (Type II Error)
Table 2 presents the performance measures used in the models for this paper. Accuracy is the proportion of correct predictions over all predictions. However, accuracy is not the best metric for an imbalanced dataset. An improved single measure is the Matthews Correlation Coefficient (MCC) 50. The MCC is a measure of the quality of binary classification. The MCC considers true and false positives and negatives and is widely regarded as a balanced measure that can be applied even when the classes are of very different sizes51. The MCC is in the range [-1, 1]. A coefficient of +1 represents a perfect prediction, a coefficient of 0 represents an average random prediction, and a coefficient of -1 represents an inverse prediction 52. The MCC has some valuable properties that make it more suitable than other measures for some purposes, most notably its ability to work well even when one of the two classes is much more frequent than the other 50,51. The formula for both the performance accuracy and the MCC is shown in Table 2.
Practitioner Measures
To understand how well a classification model is performing, we need to look beyond the performance accuracy. The accuracy may be high, but this could be due to the model predicting the most frequent class all the time, which produces inconsistent and unreliable results for an imbalanced dataset. To get a better idea of model performance, the confusion matrix can be used to calculate the performance of other metrics. From the confusion matrix, we can calculate the precision, recall, and F1-score. Precision is the number of correct predictions divided by the total number of predictions. The recall is the number of correct predictions divided by the total number of actual positive cases. Generally, a classifier with a higher precision but lower recall will miss some fraudulent items but will not incorrectly predict too many items as fraud. A classifier with a higher recall but lower precision will correctly identify more of the fraud items and incorrectly predict more items as being a fraud. The ideal classifier would have perfect precision and recall, but this is usually impossible in practice. Instead, the goal is usually to find a balance between precision and recall that gives the best overall results. The F-1 score is the harmonic mean of precision and recall that achieves this objective. A good classification model will have a high precision, recall, and F1 score 19.
A more robust measure is the Receiver Operating Characteristics (ROC). The ROC curve is a graphical tool used to evaluate the performance of a binary classifier. The curve is generated by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold values. The area under the ROC curve (AUROC) is a metric that can be used to compare different classifiers. A classifier with a higher AUC will have better discrimination, meaning it can better distinguish between positive and negative observations. A perfect classifier would have a TPR of 1 and an FPR of 0, resulting in a point in the upper left corner of the ROC curve. Generally, the closer the ROC curve is to this corner, the better the classifier performs. Classifiers that perform similarly to random guessing will have a ROC curve close to the diagonal line 18,19.
Table 2: Performance Metrics and Formulae