Title : Predicting Mobile Money Transaction Fraud using Machine Learning Algorithms
Author: Mark Lokanan
Corresponding Author : Mark Lokanan
Mark Lokanan is a data scientist and white-collar criminologist and an Associate Professor in the Faculty of Management at Royal Roads University. He is a graduate from the School of Criminology, Simon Fraser University, Canada.
Mark E. Lokanan Ph.D
Associate Professor  Faculty of Management | Royal Roads University
T 250.391.2600 ext. 4386#
2005 Sooke Road, Victoria, BC  Canada  V9B 5Y2 | royalroads.ca
Mark.Lokanan@royalroads.ca
Introduction
With the increasing popularity of mobile money services, there has been a corresponding increase in fraud and money laundering cases. Mobile money providers must therefore be vigilant in combating such activity. One way to combat fraud is to require users to provide additional information when making transactions, such as PIN or biometric data. Money laundering cases can be more difficult to detect, but mobile money providers can look for patterns of suspicious activity, such as unusually large or frequent transactions. The Financial Action Task Force (FATF) noted that mobile money payment presents a global threat to money laundering and terrorist financing as it can be used to facilitate cross-border payments without the need for a bank account. In response, the FATF released risk-based approaches to countering the threat. The guidance sets out recommendations for identifying and managing risk, including the use of computational technology.
The use of machine learning (ML) and artificial intelligence (AI) is increasingly seen to combat mobile money fraud and address anti-money laundering (AML) compliance. Computational technology has always played a role in the fight against financial crime, but the rise of ML and AI is giving law enforcement a powerful new tool in the battle against mobile money fraud. AI can help financial institutions to identify and flag suspicious behaviour, such as large or unusual transactions, and better understand their customers’ needs and risk profiles. By harnessing the power of AI, financial institutions can significantly improve their ability to combat mobile money fraud and address money laundering threats. This paper aims to use ML learning algorithms to build a fraud detection model that will detect red flags of fraud and money laundering from mobile money transactions. More specifically, this paper will use a set of risk-based indicators to predict how likely a transaction will be fraudulent.
This study provides several significant advances to the existing body of research on methods for detecting suspicious transactions in mobile money transfers. In theory, machine learning algorithms can circumvent the challenges of attempting to identify illegal transactions by relying on the more conventional rule-based benchmark methodology. In the classic rule-based benchmark technique, identifying illegal transactions is accomplished by using predefined criteria based on mathematical conditions. The rule-based approach is time-consuming and costly and has a high rate of false-positive results. ML addresses these issues by enabling computers to learn from the data and make predictions. When applied to mobile money, ML can be used to enable automated detection of potentially fraudulent transactions. An example of this would be training a ML algorithm on a dataset containing transactions that are known to be fraudulent. Based on this learned experience, the algorithm can be tuned to find future fraudulent transactions by looking for patterns similar to those found in the training data.
Practically, we propose a novel data-driven method of fraud detection that has been precisely tuned to the distinctive features of mobile money transactions. This strategy uses ML to automatically identify suspicious transactions in real-time, eliminating the need for extensive human involvement. The ML approach, which uses real-time analysis, can quickly spot transactions that could be fraudulent and stop them from going through and reduce the number of fraudulent transactions.
The remainder of this paper is structured in the following manner. Section two thoroughly analyzes the literature on ML and mobile money transfers concerning fraud and money laundering. Section three discusses the methodology and algorithms considered for the ML models. Section four examines and analyzes the results. Section five concludes with limitations in ML for fraud research and identifies opportunities for further study.
Literature Review on Mobile Money Fraud and Money Laundering
Mobile Money Services (MMS) or Mobile Money Transfer Services (MMTS) are unbanked financial services that operate primarily through smartphone apps supported by mobile operators or banking institutions and are frequently referred to as branchless banking services for their users1,2. They facilitate the fund transfer of electronic cash using the users’ mobile phones while not involving any bank account in the process 3,4. A few common examples of MMS or MMT services are: Tigopesa, M-Pesa, Simbanking, and NMB Mobile offered by Tigo Tanzania Ltd, Vodacom Tanzania, CRDB Bank, and National Microfinance Bank respectively 5(p4). Ideally, MMT enables person-to-person (P2P) payments for the customers, and the services supported by the mobile money system involve participation from various stakeholders like mobile users, regulators, mobile network operators, telecom retailers, agents, and financial institutions6. The mobile users act as the customers for the MMT services, while mobile network operators (MNOs) completely facilitate the ecosystem of MMS in conjunction with telecom retailers and agents, who are responsible for opening accounts for the customers, conducting customer due diligence and other compliance activities like KYC, and Know Your Customer. Financial institutions and regulators assist MNOs in establishing financial inclusion and risk management mechanisms, whereas MNOs limit banks to processing payment delivery, clearing, and settlement 6. A bank can or cannot be involved in the MMS depending on the adopted model of MMT 7. These players collectively enable MNOs to implement the new P2P payment facility for unbanked users.
The number of users using mobile money for small or large transactions has increased drastically in the last decade 1. Research estimates that this number is expected to rise with the increasing dependency and usage of mobile phones in the future8,9. Due to their success and popularity, mobile money systems are set to attract the attention of fraudsters interested in laundering the proceeds of crime 10. Fraudsters can launder money by seizing the details related to several mobile money transfers during transmission or creation and saving the server’s data through phishing attacks or viruses, which can then be misused to launder illicit funds 8,11. Similarly, the reprobate end users of MMS can launder their dirty money through this system by smurfing a large chunk of the illegitimate source of income into a small number of mobile money transactions, using multiple accounts and phones while avoiding the suspicious nature of the act 3,6. Indeed, some speculate that this system could be used to fund terrorist activities, though there is evidence that launderers have used mobile transfers to launder funds for terrorist financing 12. These findings have brought the need for more advanced technology to identify and control the risks associated with the mobile money system.
Detecting Mobile Money Fraud and ML Using Computational Technology
Technological innovation can be useful in mitigating various risks associated with the MMS. Improving technological surveillance by increasing the security, resilience, and scalability of MNO networks used in MMT can reduce risks associated with mobile money fraud (MMF) to some extent at the security and procedural levels 13. Implementing the two-factor authentication model for securing communications through SMS in MMT has proven very effective14. The most important contribution technology can make, other than the new developments in the security information and events management field, is through innovation or by designing MMF and money laundering prediction or detection tools for the MMS. In the following paragraphs, we will focus on using ML algorithms and artificial intelligence to mitigate the risk of MMF and money laundering in mobile money transactions.
Machine Learning and Artificially Intelligent Algorithms
Technology is pivotal in investigating and detecting fraudulent or laundered mobile money transactions15. ML, AI, and data mining have proven effective in detecting MMF and money laundering activities in the MMS 16,17. More specifically, ML algorithms teach computers to learn human behaviour and detect patterns in the data 17,18. Supervised ML algorithms like logistic regression, decision tree, gradient descent, and random forest have all been successfully used in detecting financial fraud from labelled data 16,18–22. The following sections are devoted to reviewing the literature on these algorithms.
Logistic Regression
Logistic regression will be used as the baseline algorithm to compare with the other models. Logistic regression uses a linear combination of input variables (x ) to predict an output variable (y )20. The output variable is usually (0 or 1), representing the two possible outcomes of a binary classification task (e.g., fraud or not fraud). The coefficients of the input variables (β ) are estimated using maximum likelihood estimation. The Sigmoid function is a mathematical function that is the foundation of logistic regression and takes an actual number and translates it into a value between 0 and 1. The Sigmoid translation is important for ML learning classification tasks because it allows the algorithm to easily separate data points into different classes 23. The sigmoid function is denoted in equation 1.
Where
f(x ) is the value bounded between 0 and 1,
X is he derivative of the sigmoid function,
e is the mathematical constant
f(x ) = 1 / (1 + e ^-x ). eq . 1
The output of the sigmoid function can be interpreted as a probability. For example, if the output of the Sigmoid function is 0.8, this can be interpreted as an 80% chance that the data point belongs to one class and a 20% chance that the data point belongs to the other class24,25. Using the Sigmoid function, a logistic regression model can be trained to predict the class to which a new data point belongs. The logistic regression classifier uses the Sigmoid function to estimate the probability that y  = 1, given the size of x . Equation 2 denotes the logistic regression model.
Where
Y = values between 0 and 1,
\(e\beta_{0+}\beta_{1+}X\) represents the independent features, and
B0 and B1 will give different estimations of Pr
\(\Pr\mathbf{(}Y\mathbf{=1|}X\mathbf{=}x\mathbf{)=\ }\frac{e\beta_{0+}\beta_{1+}X}{1+e\beta_{0+}\beta_{1+}X}\)eq . 2
Logistic regression is a valuable technique for fraud classification tasks 20,24,25. Research has shown that the logistic regression performed relatively well and, in some cases, outperformed other classifiers in fraud classification tasks20,23,25,26 . Logistic regression has been used in a variety of domains to predict fraud. For example, logistic regression has been used in the financial sector to detect credit card and insurance fraud 23,27. Others have used logistic regression to predict medical billing fraud with reliable results28,29. Logistic regression models have several advantages over traditional fraud detection methods. First, it is highly scalable and can be applied to large data sets 29. Second, it is highly effective at detecting fraud, with a success rate that is generally much higher than traditional methods23. In addition, logistic regression models are relatively easy to interpret, which makes them valuable tools for fraud analysts 20,25. Finally, it is relatively easy to deploy and use in production systems 20,23. However, logistic regression is not without drawbacks; in particular, it can be susceptible to overfitting if the data is not carefully preprocessed24,27. Even though model overfitting is a problem, logistic regression is an excellent way to build a fraud detection model that can be used as a benchmark to compare with other classifiers.
Decision Tree
Another useful machine learning algorithm for fraud detection is the decision tree classifier 19. Decision tree employs a tree structure for choice making, where the root symbolizes the fundamental decision, edges display the decision node, leaves show the class labels that convey the decision, and internal modes indicate qualities picked based on information gain or Gini Index18,30. Typically considered a weak learner, decision tree classification ability is boosted by using the gradient boosting technique 31. Gradient boosting is an ensemble learning technique that optimizes performance accuracy by sequentially generating the decision tree so that it is always superior to the previous one.16,18(p8). This project employs the Gini Index to label the data. The mathematical formula for Gini Index is shown in equation 2:
Where
fk is the fraction of items labeled with k in the set and ∑ fk = 1.
I G (f ) =\(=\sum_{k=1}^{m}f\)k (1-fk )eq. 4
Concerning fraud detection, a decision tree involves building a model that can predict whether an observation is legitimately derived or not32. The decision tree model is based on a series of yes-or-no questions, each narrowing down the possible outcomes (i.e., fraud or no fraud) 33. For example, a decision tree for fraud detection might ask whether the transaction is consistent with the customer’s past behaviour. If the answer is no, it could be flagged as potentially fraudulent. Once the model is built, it can be used to classify new data points as either fraudulent or non-fraudulent. Decision tree algorithms are highly effective in identifying fraud. They are often used with other methods, such as rule-based systems and ML18,19,33. Classification algorithms based on decision trees are a powerful way to find fraud because they can help find even the most complex kinds of fraud.
Gradient Descent
Gradient descent is a machine learning algorithm that uses first-order iterative optimization to find the minimum of a function. To locate a function’s local minimum using gradient descent, one must take steps proportional to the function’s negative gradient (or approximate gradient) at the current point 18. Instead, if one takes steps proportional to the gradient’s positive, one approaches a local maximum of that function, known as gradient ascent34. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f ) that minimizes a cost function (c ). The cost function is a measure of how far away the predicted values are from the actual values. The algorithm iteratively adjusts the coefficients until it converges on a set of coefficients that minimizes the cost function34,35.
The algorithm is represented by the probabilistic formula where the likelihood function p (x , 𝛽0, 𝛽1) predicts the probability of a binary outcome given a set of independent variables. In this case, the algorithm is trained to predict whether an instance belongs to class 0 or 1, which are represented by the labels 𝑦 = 0 and 𝑦 = 1. The coefficients 𝛽0 and 𝛽1 represent the probability of the output y to be 1 or 0 given x . In other words, 𝛽0 and 𝛽1 are the log odds of the output being 1 or 0 given x 18,34. The gradient descent algorithm is popular for machine learning applications, particularly in fraud detection, because the algorithm can learn from data very quickly and effectively. Additionally, the gradient descent algorithm can handle very large datasets, making it ideal for fraud detection applications requiring high accuracy. Numerous studies have been conducted on the efficacy of the gradient descent algorithm for fraud detection. Most of these studies have found that the algorithm effectively detects fraudulent activities 36,37. Recent studies have found that the algorithm was very effective in detecting known fraud cases in a dataset of credit card transactions36,38. Another study found that the algorithm could successfully identify fraudulent insurance claims with high accuracy37. Furthermore, the gradient descent algorithm is also relatively easy to implement, which makes it a good choice for organizations that do not have a lot of resources or expertise in fraud detection methods.
Random Forest
Random Forest is a machine learning algorithm for classification and regression tasks and is learned base on the decision tree concept22,23,39. The random forest algorithm generates a series of decision trees, each created using a randomly chosen subset of the training data 40,41. Even though random forest is a useful learning algorithm that can be used to solve both linear and nonlinear problems, it is especially useful for addressing nonlinear data 16,39. The predictions of the individual trees are then combined to produce the final prediction. The random forest algorithm is effective because it reduces the prediction variance while maintaining the model’s accuracy 41,42. Additionally, the random forest algorithm, when pruned, is resistant to overfitting, which means it can handle large datasets and generalize well to new data. Because of these advantages, the random forest algorithm is a powerful tool for machine learning applications.
The advantage of the random forest algorithm for classification tasks is that it can help reduce the number of false positives generated by other AI methods, such as neural networks 40. In addition, the random forest technique is not difficult to construct, and it is possible to execute the algorithm on large datasets with accurate performance 42. These two features combine to make the algorithm a powerful instrument for discovering patterns in data. In particular, the random forest classifier is well-suited for detecting fraud, often identifying unusual patterns in the data19,41. For example, fraudsters might create multiple accounts with different email addresses and use them to make small purchases to avoid detection. Alternatively, they might try to return items they never bought to receive a refund. By looking for these and other unusual patterns, the random forest algorithm can help to detect fraud before it results in significant losses. In addition, the random Forest classifier has also been used in other domains such as loan default, credit risks, image recognition, and medical diagnosis19,23,39,41. The random forest classifier has proven to be a versatile tool algorithm and has been used in various domains.
Research Design and Experimental Setting
Data Generation and Stimulation
Currently, there is a lack of data on fraud and money laundering detection 43. One of the reasons cited for this outcome is confidentiality and the sensitivity of the data. Researchers have developed stimulators that use algorithms to generate synthetic data from real-time observations to address this problem. Some of the most prominent stimulators used by researchers are the Mobile Money Simulator (PaySim) and Retail Store Simulator (RetSim)43,44. These simulators allow researchers to generate synthetic transactional data that contains both legitimate and fraudulent transactions. 43 and 44demonstrated using Agent-Based Simulation (ABS) and Multi Agent-Based Simulations (MABS) that synthetic transactional data developed by PaySim and RetSim are as useful as real transaction data for detecting MMF and money laundering activities while retaining the reliability and confidentiality of the actual transaction data.
The data for this project came from an MABS that was used to calibrate real-time transactions. The data came from Lopez-Rojas and his colleagues’ work, who use MABS to develop agents representing clients and merchants in PaySim and customers and salesmen in RetSim43,44. The data is simulated and uses a real-world scenario based on a well-known fraud scheme to demonstrate the superiority of simulated data over real-world data when establishing adequate controls for fraud detection 45. The usual behaviour was derived from the behaviour that was observed in the data collected. This behaviour is enshrined in the agents’ rules governing the transactions and interactions between consumers and salespeople or between customers and merchants. Based on patterns of actual fraud43, some of these agents were set up to commit fraud.
Data Description and Variables
PaySim is used to simulate mobile money transactions in this dataset. The simulations are based on a sample of actual mobile money transactions that were taken from one month’s worth of financial logs generated by a mobile money service that was deployed in an African nation. The first logs were given by a global firm that is the supplier of the mobile financial service presently operational in more than 14 countries. The company provided the original logs. In total, 1048575 rows of data were collected that comprised nine independent features. Table 1 depicts the features and target variables that represent the dataset.
Table 1: Independent Features and Description