This is the second installment in a 3 part series. The first installment provides a short background on anti-money laundering. In this installment, we examine common AML problems faced by financial institutions today. The third installment introduces an approach that carries AML into the future.
Part II: Current Challenges in AML
There are several key areas in the field of anti-money laundering (AML) that rely heavily on technology. In a general sense, AML processing funnels data through four phases: data integration, Know Your Customer (KYC), transaction monitoring, and investigation and disclosure.
AML starts with the collection and processing of data, which may include transactions, customer profiles, and other relevant information from internal and external data sources. The data is validated, conformed to a canonical form, and often enriched with third-party data such as credit scores and geo-location. The goal is to construct a single view of the customer, which is later consumed by downstream processes for KYC, transaction monitoring, and investigation.
Having clean, validated, conformed historical data in a single repository is critical for an effective AML program. It provides more relevant matches for KYC screening, enables better predictive algorithms for risk scoring, shortens time spent on investigation, and cuts down on costs to support periodic regulatory reviews. Yet, most financial institutions struggle with assembling a single view of the customer, due to the increasing volume and variety of data sources for AML. Current AML systems, typically built on relational databases, have difficulties at scale, and do not have the agility to ingest, process, or serve new quality data to keep up with changes in the business or with regulations.
Know-your-customer, or KYC, is an aspect of AML policy that requires institutions to perform due diligence on their customers. Due diligence starts when the account is onboarded, and is conducted on a periodic basis through the lifetime of the account. It may include:
- Identity verification. Collecting and evaluating documents to ascertain customer identity. Documents are ideally digitized and stored in a database, such that they may be retrieved on demand for investigations and regulatory reviews.
- Watchlist screening. Checking whether the customer is a politically exposed person (PEP), or appears in any number of lists maintained by regulators (e.g., sanctions list from the Office of Foreign Asset Control, or OFAC). Screening may also include media monitoring for any adverse news involving the customer. The technical challenge is around name matching, which needs to take into account name variants and foreign translations.
- Risk assessment. Looking at various risk factors to determine the customer’s propensity for money laundering, terrorist financing, and other scenarios. This is typically done using supervised machine learning techniques such as linear regression, which benefit greatly from having deeper and wider data. More data sets for longer historical time periods usually means better predictive models.
- Link analysis. Looking at customer relationships with other entities. In particular, determining the ultimate beneficial owner of shell companies. This requires a system that can store and analyze relationship graphs.
- Behavioral classification. Classifying an account with similar accounts, and determining whether account activity is normal for its cohort. This could be done using k-means and other machine learning techniques for clustering and anomaly detection.
The building blocks for effective KYC–name matching, risk scoring, link analysis, and classification–fall in the domain of data science and machine learning. Most mainstream AML solutions were not built with machine learning in mind. And while it is possible to bolt on machine learning capabilities to existing solutions, this approach by itself will not produce positive results. The key to effective machine learning applications is to have lots of good quality data, and the ability to quickly prepare that data for use by data scientists. The way to achieve effective KYC is to have sound big data management and a reliable, single view of the customer.
Regulated institutions are required to monitor financial transactions on a regular cycle, typically daily at close of business. Current monitoring systems are rules-based, and generate alerts when one or a combination of rules are matched. For example, cash transactions over $10,000 are automatically flagged and included in a cash transaction report (CTR). Sets of rules are used to model a suspicious scenario, and they fire alerts that are then triaged and evaluated by a group of investigators. Confirmed alerts go into suspicious activity reports (SARs) which are then sent to regulators.
Rules-based systems are simple to reason through, but are costly in practice. They are coarse-grained, difficult to tune, inflexible, and easily circumvented. They rely on parametrized thresholds, which have to be tuned to levels sufficient to catch the bulk of suspicious transactions. As a result, rules are often tuned to be over-aggressive and pick up a lot of false positives. Across the industry, it is common to see a false positive rate of 90%, with some institutions reporting numbers as high as 98%. Paradoxically, the coarse-grained shotgun based approach of rules based system is because rules are too fine-grained. Individual rules are brittle and static, they are built to detect a specific known pattern of behavior. As a result, rules also do poorly with false negatives. They fail to detect the so-called “unknown unknowns” — emergent behavior that have not yet been codified into a rule.
Investigations and Disclosure
Whenever a transaction is flagged as suspicious, a case is opened to investigate the alert. Investigations are labor intensive and make up the bulk of the cost in most AML programs. Due to the high rate of false positives, several of the larger financial institutions have staff of several hundred investigators to handle the case workload. False positives put undue burden on the investigative teams. But they are not the only cost factor–another major factor is the time it takes to fully investigate a case. An investigator typically switches between several information systems in order to access all the data needed to process a single alert. Inefficient access to information leads investigators to spend an inordinate amount of time on each case.
Regulatory disclosure and reviews share a similar burden. The covered institutions struggle to present information to the regulators due to information that is stored in silos. More seriously, information silos make it harder to trace the provenance of data. This lack of traceability and transparency places an additional burden on part of the covered institutions to prove the soundness of their AML systems and processes.
To summarize, the majority of AML systems deployed today are built on decades old technology, based on decades old assumptions, policies, and practices. They suffer from a number of deficiencies that contribute to soaring AML costs and increased regulatory risk. As we enter a new wave of regulatory changes, the time has come to consider a modern technology platform with the agility, capability, and scale to handle the future.
In part III of this series, we will look at a full end-to-and AML and financial crime solution built for big data and machine learning.