The Simility Blog

A Primer on Machine Learning Models for Fraud Detection
Jayan Tharayil
June 28, 2017

Machine learning is the new black—among tech companies, everyone is trying to figure out how they can associate themselves with this sexy new technology. At Simility, we’ve been working with machine learning for years, applying it to the real-world problem of online fraud – and it’s working.

Old-school fraud detection approaches, typically rules-based, aren’t enough to fend off the bad guys anymore. Fraudster techniques are becoming increasingly sophisticated, and because there’s big money to be made, many fraud rings have invested in their own machine learning–based technology. You need to fight fire with fire.

Machine learning can work on large datasets easily and detect things that humans can’t. The models continually learn and adapt, using outcomes data that is fed back into the system. They also scale beautifully.

Here’s what you need to know about machine-learning models if you’re just getting started.

Choosing the Right Model, Part 1: Unsupervised or Supervised Models

At a high level, there are two types of machine-learning models: unsupervised and supervised. Unsupervised models don’t have clearly labeled data, while supervised models do. 

When building a machine-learning model suite for fraud detection, it is very important not only to identify bad activity but also to allow genuinely good transactions to go through.

Unsupervised models are used primarily to identify anomalies (outliers). Then supervised models can be used to determine which of those anomalies are fraudulent and which are just unusual. When building a machine-learning model suite for fraud detection, it is very important not only to identify bad activity but also to allow genuinely good transactions to go through. Our systems should always have only positive impact on user experience.

So the typical flow is:

  • Use unsupervised models to find clusters
  • Manually review and label them
  • Train a supervised model using the labels

Here’s an example: for one client (a hotel listing app), our unsupervised models had correctly identified certain cases where customers had their phones in flight mode but had turned on Wi-Fi and were trying to book rooms. But when our human analysts reviewed the data, we found that the customers had traveled to foreign countries and were simply booking hotel rooms from airports. They probably had their phones on flight mode to avoid high data roaming costs. We were then able to train our supervised models to allow such transactions, thereby avoiding such false positives. This combination of machine learning and human analysis is key, as you’ll see later.

You use unsupervised models when you don’t have labeled data. This is typically the case when a company is first starting out with machine learning–based fraud detection. Unsupervised models help identify the distribution of data and establish baselines such as the typical settings for an iPhone user in the U.S. or Brazil.

Once data is grouped into clusters, you train the models using transactions that have been confirmed as good or bad. Let’s say you have data on credit-card chargebacks. You now know, in hindsight, that those transactions were bad. You can feed that now-labeled data into the supervised model to train it to more accurately recognize bad transactions of that type. Over time, more and more transactions go straight from the unsupervised to supervised models, without manual review.

Choosing the Right Model, Part 2: Clustering, Tree-based and Deep-learning Models

People often ask me, “What’s the best type of machine-learning model for fraud detection?” There is no single best model, nor would a company ever have to pick just one.  Multiple algorithms can be used to build both unsupervised and supervised models according to criteria like data size, model size, learning time, etc.

For unsupervised models, we typically use k-means or Markov models. k-means is a clustering algorithm that tries to group observations into neighborhoods that share similar characteristics. Observations (transactions, accounts, etc.) that don’t belong to large neighborhoods generally are treated as anomalous. Markov models, more specifically Hidden Markov models, are great at learning sequences and predicting what event would most likely occur next. These can be used to identify scenarios where events are occurring in an unexpected order.

The supervised side offers many options to try out. Logistic or linear regression models are the generally the simplest, but you can go a bit further with decision-tree–based models, which provide a lot of explainability. You can clearly understand what criteria (“features”) are weighted more highly to determine the fraud/not fraud decision. Let’s consider an example where a U.S. customer is buying a Louis Vuitton bag for $100 from China at midnight, paying with a credit card that has not been used within the last 60 days. A trained decision-tree model could read something like: if the credit card has not been used within the last 60 days, go left (toward “possibly fraudulent”). If the price is greater than $50, go left. And on and on, leading the model to determine whether or not a transaction is potentially fraudulent and should be flagged for human review.

You can go a step further by using an ensemble of models employing algorithms like boosting or random forest, or try Support Vector Machines (SVMs), especially when you have more nuanced criteria for fraud and notice that simpler models are unable to cleanly classify the observations.

Training deep-learning models require massive amounts of data, so they are really only practical for large companies or those that generate a lot of data points.

One area of machine learning that’s getting a lot of buzz in recent years is artificial neural networks (ANNs), aka “deep learning” models, which try to simulate how layers of neurons act together in the brain to make a decision. ANN models are highly versatile and can be used to solve highly complex problems like identifying account takeover using the device’s sensor data.  While other techniques often require limiting the number of features, multi-layer ANNs can train on thousands of features and scale easily. You may be thinking, “Why not just use deep-learning models all the time?” Training such models requires massive amounts of data (typically, millions of labeled transactions), so deep learning models are really only practical for large companies or those that generate a lot of data points. Also, since these models are so complex, they generally become a “black box” and lose the explainability that simpler models provide.  That means it is hard to figure out why the model classified a transaction as fraud or not fraud.

Here’s an example of unsupervised and supervised models in action. For two financial services clients, we developed unsupervised (Markov) models that estimate the likelihood of a sequence of actions being fraudulent. A typical sequence for an online banking customer might be “login, check account balance, transfer money to a known beneficiary, logout.” Since that sequence would appear frequently, it would have a low likelihood of being fraudulent and would receive a low score. But what if the sequence was instead, “log in, password change, email change, add a new beneficiary, transfer money, delete beneficiary, logout”? That is an atypical sequence and would result in a high score indicating fraud.

The model was trained here without any explicit labels provided by the customer – we measured the likelihood of a sequence by looking at past sequences. As our analysts started reviewing flagged sequences and labeling the user sessions as good or bad, we started training a supervised model that confirmed whether an anomalous sequence was indeed fraudulent or not. This model was tuned for high precision so that the cases can be auto-actioned (e.g., a likely fraudulent sequence results in the transaction being automatically canceled before it is completed).

One new aspect where we’ve been having success is in the use of micro-models. Micro-models, as the name implies, break down the fraud-detection problem into small areas for analysis. For instance, we’ll build a binary classification (supervised) model that’s just for identifying whether an email address is bad or not, or whether an IP address is a botnet. Then we build an ensemble model comprising many such micro-models. The micro-models are easy to build, train, and update, and together, the ensemble is highly effective.

So, why do we need humans for all of this?

Human intelligence is still required to detect fraud, because human intelligence is still being used to create fraud.

While there is a perception that fraud detection can be completely automated today using machine learning, without any need for human intuition or knowledge, it is false. Human intelligence is still required to detect fraud because human intelligence is still being used to create fraud. The bad guys are coming up with new fraud techniques, literally every day. So, while you train your model with past data, there could already be something new that the model hasn’t seen before. In such cases, the fastest way to act would be to write customized rules to detect new fraudulent behavior.

Will we get to the point where machine-learning models are automatically able to identify new kinds of fraud without regular human intervention? Yes, certainly. But not for a while. For relatively simple verticals, like ticketing, where the number of features and number of actions a user can perform is limited, it’s likely to happen sooner. But more complicated sectors such as banking could be a decade away from being fully AI driven because fraud attempts there are far more sophisticated and rapidly evolving. That’s also the reason why unsupervised models always play an important role: they are the key to finding new kinds of fraud.

Edit: Previously, our post indicated that we use kNN as a clustering algorithm, which was incorrect. kNN is generally used to build classification or regression models.

Would you like to see how Simility detects fraud? Schedule a demo here.