Title : Predicting Mobile Money Transaction Fraud using Machine
Learning Algorithms
Author: Mark Lokanan
Corresponding Author : Mark Lokanan
Mark Lokanan is a data scientist and white-collar criminologist and an
Associate Professor in the Faculty of Management at Royal Roads
University. He is a graduate from the School of Criminology, Simon
Fraser University, Canada.
Mark E. Lokanan Ph.D
Associate Professor
Faculty of Management | Royal Roads University
T 250.391.2600 ext. 4386#
2005 Sooke Road, Victoria, BC Canada V9B 5Y2
| royalroads.ca
Mark.Lokanan@royalroads.ca
Introduction
With the increasing popularity of mobile money services, there has been
a corresponding increase in fraud and money laundering cases. Mobile
money providers must therefore be vigilant in combating such activity.
One way to combat fraud is to require users to provide additional
information when making transactions, such as PIN or biometric data.
Money laundering cases can be more difficult to detect, but mobile money
providers can look for patterns of suspicious activity, such as
unusually large or frequent transactions. The Financial Action Task
Force (FATF) noted that mobile money payment presents a global threat to
money laundering and terrorist financing as it can be used to facilitate
cross-border payments without the need for a bank account. In response,
the FATF released risk-based approaches to countering the threat. The
guidance sets out recommendations for identifying and managing risk,
including the use of computational technology.
The use of machine learning (ML) and artificial intelligence (AI) is
increasingly seen to combat mobile money fraud and address anti-money
laundering (AML) compliance. Computational technology has always played
a role in the fight against financial crime, but the rise of ML and AI
is giving law enforcement a powerful new tool in the battle against
mobile money fraud. AI can help financial institutions to identify and
flag suspicious behaviour, such as large or unusual transactions, and
better understand their customers’ needs and risk profiles. By
harnessing the power of AI, financial institutions can significantly
improve their ability to combat mobile money fraud and address money
laundering threats. This paper aims to use ML learning algorithms to
build a fraud detection model that will detect red flags of fraud and
money laundering from mobile money transactions. More specifically, this
paper will use a set of risk-based indicators to predict how likely a
transaction will be fraudulent.
This study provides several significant advances to the existing body of
research on methods for detecting suspicious transactions in mobile
money transfers. In theory, machine learning algorithms can circumvent
the challenges of attempting to identify illegal transactions by relying
on the more conventional rule-based benchmark methodology. In the
classic rule-based benchmark technique, identifying illegal transactions
is accomplished by using predefined criteria based on mathematical
conditions. The rule-based approach is time-consuming and costly and has
a high rate of false-positive results. ML addresses these issues by
enabling computers to learn from the data and make predictions. When
applied to mobile money, ML can be used to enable automated detection of
potentially fraudulent transactions. An example of this would be
training a ML algorithm on a dataset containing transactions that are
known to be fraudulent. Based on this learned experience, the algorithm
can be tuned to find future fraudulent transactions by looking for
patterns similar to those found in the training data.
Practically, we propose a novel data-driven method of fraud detection
that has been precisely tuned to the distinctive features of mobile
money transactions. This strategy uses ML to automatically identify
suspicious transactions in real-time, eliminating the need for extensive
human involvement. The ML approach, which uses real-time analysis, can
quickly spot transactions that could be fraudulent and stop them from
going through and reduce the number of fraudulent transactions.
The remainder of this paper is structured in the following manner.
Section two thoroughly analyzes the literature on ML and mobile money
transfers concerning fraud and money laundering. Section three discusses
the methodology and algorithms considered for the ML models. Section
four examines and analyzes the results. Section five concludes with
limitations in ML for fraud research and identifies opportunities for
further study.
Literature Review on Mobile Money Fraud and Money Laundering
Mobile Money Services (MMS) or Mobile Money Transfer Services (MMTS) are
unbanked financial services that operate primarily through smartphone
apps supported by mobile operators or banking institutions and are
frequently referred to as branchless banking services for their users1,2. They facilitate the fund transfer of electronic
cash using the users’ mobile phones while not involving any bank account
in the process 3,4. A few common examples of MMS or
MMT services are: Tigopesa, M-Pesa, Simbanking, and NMB Mobile offered
by Tigo Tanzania Ltd, Vodacom Tanzania, CRDB Bank, and National
Microfinance Bank respectively 5(p4). Ideally, MMT
enables person-to-person (P2P) payments for the customers, and the
services supported by the mobile money system involve participation from
various stakeholders like mobile users, regulators, mobile network
operators, telecom retailers, agents, and financial institutions6. The mobile users act as the customers for the MMT
services, while mobile network operators (MNOs) completely facilitate
the ecosystem of MMS in conjunction with telecom retailers and agents,
who are responsible for opening accounts for the customers, conducting
customer due diligence and other compliance activities like KYC, and
Know Your Customer. Financial institutions and regulators assist MNOs in
establishing financial inclusion and risk management mechanisms, whereas
MNOs limit banks to processing payment delivery, clearing, and
settlement 6. A bank can or cannot be involved in the
MMS depending on the adopted model of MMT 7. These
players collectively enable MNOs to implement the new P2P payment
facility for unbanked users.
The number of users using mobile money for small or large transactions
has increased drastically in the last decade 1.
Research estimates that this number is expected to rise with the
increasing dependency and usage of mobile phones in the future8,9. Due to their success and popularity, mobile money
systems are set to attract the attention of fraudsters interested in
laundering the proceeds of crime 10. Fraudsters can
launder money by seizing the details related to several mobile money
transfers during transmission or creation and saving the server’s data
through phishing attacks or viruses, which can then be misused to
launder illicit funds 8,11. Similarly, the reprobate
end users of MMS can launder their dirty money through this system by
smurfing a large chunk of the illegitimate source of income into a small
number of mobile money transactions, using multiple accounts and phones
while avoiding the suspicious nature of the act 3,6.
Indeed, some speculate that this system could be used to fund terrorist
activities, though there is evidence that launderers have used mobile
transfers to launder funds for terrorist financing 12.
These findings have brought the need for more advanced technology to
identify and control the risks associated with the mobile money system.
Detecting Mobile Money Fraud and ML Using Computational
Technology
Technological innovation can be useful in mitigating various risks
associated with the MMS. Improving technological surveillance by
increasing the security, resilience, and scalability of MNO networks
used in MMT can reduce risks associated with mobile money fraud (MMF) to
some extent at the security and procedural levels 13.
Implementing the two-factor authentication model for securing
communications through SMS in MMT has proven very effective14. The most important contribution technology can
make, other than the new developments in the security information and
events management field, is through innovation or by designing MMF and
money laundering prediction or detection tools for the MMS. In the
following paragraphs, we will focus on using ML algorithms and
artificial intelligence to mitigate the risk of MMF and money laundering
in mobile money transactions.
Machine Learning and Artificially Intelligent Algorithms
Technology is pivotal in investigating and detecting fraudulent or
laundered mobile money transactions15. ML, AI, and
data mining have proven effective in detecting MMF and money laundering
activities in the MMS 16,17. More specifically, ML
algorithms teach computers to learn human behaviour and detect patterns
in the data 17,18. Supervised ML algorithms like
logistic regression, decision tree, gradient descent, and random forest
have all been successfully used in detecting financial fraud from
labelled data 16,18–22. The following sections are
devoted to reviewing the literature on these algorithms.
Logistic Regression
Logistic regression will be used as the baseline algorithm to compare
with the other models. Logistic regression uses a linear combination of
input variables (x ) to predict an output variable (y )20. The output variable is usually (0 or 1),
representing the two possible outcomes of a binary classification task
(e.g., fraud or not fraud). The coefficients of the input variables
(β ) are estimated using maximum likelihood estimation. The
Sigmoid function is a mathematical function that is the foundation of
logistic regression and takes an actual number and translates it into a
value between 0 and 1. The Sigmoid translation is important for ML
learning classification tasks because it allows the algorithm to easily
separate data points into different classes 23. The
sigmoid function is denoted in equation 1.
Where
f(x ) is the value bounded between 0 and 1,
X is he derivative of the sigmoid function,
e is the mathematical constant
f(x ) = 1 / (1 + e ^-x ). eq . 1
The output of the sigmoid function can be interpreted as a probability.
For example, if the output of the Sigmoid function is 0.8, this can be
interpreted as an 80% chance that the data point belongs to one class
and a 20% chance that the data point belongs to the other class24,25. Using the Sigmoid function, a logistic
regression model can be trained to predict the class to which a new data
point belongs. The logistic regression classifier uses the Sigmoid
function to estimate the probability that y = 1, given the size
of x . Equation 2 denotes the logistic regression model.
Where
Y = values between 0 and 1,
\(e\beta_{0+}\beta_{1+}X\) represents the independent features, and
B0 and B1 will give
different estimations of Pr
\(\Pr\mathbf{(}Y\mathbf{=1|}X\mathbf{=}x\mathbf{)=\ }\frac{e\beta_{0+}\beta_{1+}X}{1+e\beta_{0+}\beta_{1+}X}\)eq . 2
Logistic regression is a valuable technique for fraud classification
tasks 20,24,25. Research has shown that the logistic
regression performed relatively well and, in some cases, outperformed
other classifiers in fraud classification tasks20,23,25,26 . Logistic regression has been used in a
variety of domains to predict fraud. For example, logistic regression
has been used in the financial sector to detect credit card and
insurance fraud 23,27. Others have used logistic
regression to predict medical billing fraud with reliable results28,29. Logistic regression models have several
advantages over traditional fraud detection methods. First, it is highly
scalable and can be applied to large data sets 29.
Second, it is highly effective at detecting fraud, with a success rate
that is generally much higher than traditional methods23. In addition, logistic regression models are
relatively easy to interpret, which makes them valuable tools for fraud
analysts 20,25. Finally, it is relatively easy to
deploy and use in production systems 20,23. However,
logistic regression is not without drawbacks; in particular, it can be
susceptible to overfitting if the data is not carefully preprocessed24,27. Even though model overfitting is a problem,
logistic regression is an excellent way to build a fraud detection model
that can be used as a benchmark to compare with other classifiers.
Decision Tree
Another useful machine learning algorithm for fraud detection is the
decision tree classifier 19. Decision tree employs a
tree structure for choice making, where the root symbolizes the
fundamental decision, edges display the decision node, leaves show the
class labels that convey the decision, and internal modes indicate
qualities picked based on information gain or Gini Index18,30. Typically considered a weak learner, decision
tree classification ability is boosted by using the gradient boosting
technique 31. Gradient boosting is an ensemble
learning technique that optimizes performance accuracy by sequentially
generating the decision tree so that it is always superior to the
previous one.16,18(p8). This project employs the Gini
Index to label the data. The mathematical formula for Gini Index is
shown in equation 2:
Where
fk is the fraction of items labeled with k in the set and
∑ fk = 1.
I G (f ) =\(=\sum_{k=1}^{m}f\)k (1-fk )eq. 4
Concerning fraud detection, a decision tree involves building a model
that can predict whether an observation is legitimately derived or not32. The decision tree model is based on a series of
yes-or-no questions, each narrowing down the possible outcomes (i.e.,
fraud or no fraud) 33. For example, a decision tree
for fraud detection might ask whether the transaction is consistent with
the customer’s past behaviour. If the answer is no, it could be flagged
as potentially fraudulent. Once the model is built, it can be used to
classify new data points as either fraudulent or non-fraudulent.
Decision tree algorithms are highly effective in identifying fraud. They
are often used with other methods, such as rule-based systems and ML18,19,33. Classification algorithms based on decision
trees are a powerful way to find fraud because they can help find even
the most complex kinds of fraud.
Gradient Descent
Gradient descent is a machine learning algorithm that uses first-order
iterative optimization to find the minimum of a function. To locate a
function’s local minimum using gradient descent, one must take steps
proportional to the function’s negative gradient (or approximate
gradient) at the current point 18. Instead, if one
takes steps proportional to the gradient’s positive, one approaches a
local maximum of that function, known as gradient ascent34. It is an optimization algorithm used to find the
values of parameters (coefficients) of a function (f ) that
minimizes a cost function (c ). The cost function is a measure of
how far away the predicted values are from the actual values. The
algorithm iteratively adjusts the coefficients until it converges on a
set of coefficients that minimizes the cost function34,35.
The algorithm is represented by the probabilistic formula where the
likelihood function p (x , 𝛽0, 𝛽1) predicts the probability
of a binary outcome given a set of independent variables. In this case,
the algorithm is trained to predict whether an instance belongs to class
0 or 1, which are represented by the labels 𝑦 = 0 and 𝑦 = 1. The
coefficients 𝛽0 and 𝛽1 represent the probability of the output y to be 1
or 0 given x . In other words, 𝛽0 and 𝛽1 are the log odds of the
output being 1 or 0 given x 18,34. The gradient
descent algorithm is popular for machine learning applications,
particularly in fraud detection, because the algorithm can learn from
data very quickly and effectively. Additionally, the gradient descent
algorithm can handle very large datasets, making it ideal for fraud
detection applications requiring high accuracy. Numerous studies have
been conducted on the efficacy of the gradient descent algorithm for
fraud detection. Most of these studies have found that the algorithm
effectively detects fraudulent activities 36,37.
Recent studies have found that the algorithm was very effective in
detecting known fraud cases in a dataset of credit card transactions36,38. Another study found that the algorithm could
successfully identify fraudulent insurance claims with high accuracy37. Furthermore, the gradient descent algorithm is
also relatively easy to implement, which makes it a good choice for
organizations that do not have a lot of resources or expertise in fraud
detection methods.
Random Forest
Random Forest is a machine learning algorithm for classification and
regression tasks and is learned base on the decision tree concept22,23,39. The random forest algorithm generates a
series of decision trees, each created using a randomly chosen subset of
the training data 40,41. Even though random forest is
a useful learning algorithm that can be used to solve both linear and
nonlinear problems, it is especially useful for addressing nonlinear
data 16,39. The predictions of the individual trees
are then combined to produce the final prediction. The random forest
algorithm is effective because it reduces the prediction variance while
maintaining the model’s accuracy 41,42. Additionally,
the random forest algorithm, when pruned, is resistant to overfitting,
which means it can handle large datasets and generalize well to new
data. Because of these advantages, the random forest algorithm is a
powerful tool for machine learning applications.
The advantage of the random forest algorithm for classification tasks is
that it can help reduce the number of false positives generated by other
AI methods, such as neural networks 40. In addition,
the random forest technique is not difficult to construct, and it is
possible to execute the algorithm on large datasets with accurate
performance 42. These two features combine to make the
algorithm a powerful instrument for discovering patterns in data. In
particular, the random forest classifier is well-suited for detecting
fraud, often identifying unusual patterns in the data19,41. For example, fraudsters might create multiple
accounts with different email addresses and use them to make small
purchases to avoid detection. Alternatively, they might try to return
items they never bought to receive a refund. By looking for these and
other unusual patterns, the random forest algorithm can help to detect
fraud before it results in significant losses. In addition, the random
Forest classifier has also been used in other domains such as loan
default, credit risks, image recognition, and medical diagnosis19,23,39,41. The random forest classifier has proven
to be a versatile tool algorithm and has been used in various domains.
Research Design and Experimental Setting
Data Generation and Stimulation
Currently, there is a lack of data on fraud and money laundering
detection 43. One of the reasons cited for this
outcome is confidentiality and the sensitivity of the data. Researchers
have developed stimulators that use algorithms to generate synthetic
data from real-time observations to address this problem. Some of the
most prominent stimulators used by researchers are the Mobile Money
Simulator (PaySim) and Retail Store Simulator (RetSim)43,44. These simulators allow researchers to generate
synthetic transactional data that contains both legitimate and
fraudulent transactions. 43 and 44demonstrated using Agent-Based Simulation (ABS) and Multi Agent-Based
Simulations (MABS) that synthetic transactional data developed by PaySim
and RetSim are as useful as real transaction data for detecting MMF and
money laundering activities while retaining the reliability and
confidentiality of the actual transaction data.
The data for this project came from an MABS that was used to calibrate
real-time transactions. The data came from Lopez-Rojas and his
colleagues’ work, who use MABS to develop agents representing clients
and merchants in PaySim and customers and salesmen in RetSim43,44. The data is simulated and uses a real-world
scenario based on a well-known fraud scheme to demonstrate the
superiority of simulated data over real-world data when establishing
adequate controls for fraud detection 45. The usual
behaviour was derived from the behaviour that was observed in the data
collected. This behaviour is enshrined in the agents’ rules governing
the transactions and interactions between consumers and salespeople or
between customers and merchants. Based on patterns of actual fraud43, some of these agents were set up to commit fraud.
Data Description and Variables
PaySim is used to simulate mobile money transactions in this dataset.
The simulations are based on a sample of actual mobile money
transactions that were taken from one month’s worth of financial logs
generated by a mobile money service that was deployed in an African
nation. The first logs were given by a global firm that is the supplier
of the mobile financial service presently operational in more than 14
countries. The company provided the original logs. In total, 1048575
rows of data were collected that comprised nine independent features.
Table 1 depicts the features and target variables that represent the
dataset.
Table 1: Independent Features and Description