Skip to main content

Global Banking and Big Data: The Challenge of Anti-Money-Laundering Compliance

By November 11, 2014Article

Although fraud and abuse are often cited as main drivers for the adoption of Hadoop in highly regulated industries, there has been relatively little focus on Big Data to prevent money laundering within commercial verticals. The former White House Deputy Chief Technology Officer, Daniel Weitzner, recently told The Wall Street Journal, “[Companies have] taken it on themselves to spot fraudulent transactions. [They] have invested billions in incredibly sophisticated Big Data techniques …  But the understanding is the government — [and not banks] — will do the analysis to spot money laundering.” 

However, a series of high-profile decisions by the U.S. Department of Justice against BNP Paribas, JP Morgan Chase, Barclays and other large, global banks resulting in multi-billion-dollar fines has brought anti-money laundering (AML) to the top of the financial services industry’s priority list. While the first wave of investment in Big Data tools and technology has heretofore been targeted at the identification and prevention of nefarious activities that lead to direct costs for banks, payment processors and their customers, spending in the near term may likely be related to compliance with three key pieces of AML regulation:

  • The Bank Secrecy Act (BSA)
  • Know Your Customer (KYC)
  • The Foreign Account Tax Compliance Act (FATCA)

 Big Data lightens the burden of investigation 

Unlike other forms of fraud that are identified with machine learning algorithms that detect anomalies and outliers, money laundering schemes are designed to closely mimic typical banking behaviors and are, therefore, characteristically less anomalous. The thresholds mandated by reporting policies like BSA and utilized by first- and second-generation AML systems are well known, so criminals have little difficulty modeling the source of their above-board trade and transaction behaviors to be largely imperceptible, even to specialized software. 

As a result, these systems must be enriched with much larger and more diverse data sets to isolate signals of possible money laundering. When a signal is detected, human judgment must be applied — a case is opened, which kicks off an inquiry to verify the crime and the extent of the damage. Without Big Data, the AML indicators are often not sufficiently distinct to be caught by computational models and leave most of the work to a time-consuming and expensive investigation. 

In fact, respondents to KPMG’s 2014 “Global Anti-Money Laundering Survey” reported they are “increasingly unhappy with their current automated monitoring efforts, [and are] looking for software that can reduce the burden on the compliance department.” 

Apache Hadoop is the ideal platform for AML. As part of an enterprise data hub, it expands on the performance of any entrenched data management infrastructure and fraud detection applications to extend the value of existing investments. Beyond the requirements of bringing larger and more long-term descriptive data sets online to improve the performance of legacy AML solutions with more relevant predictive models, the additional components of the Hadoop stack enable better fraud detection by performing or augmenting the actual exploration, discovery, investigation, and forensics. 

Building an AML solution with an enterprise data hub 

Here’s a brief overview of the enterprise data hub value chain for AML: 

Data collection. Bank data tends to be segregated into silos, and modeling is usually limited to a few weeks or months. In contrast, the cost of storing data on Hadoop is typically orders of magnitude lower than every other alternative, meaning data spanning decades can easily and affordably be retained and queried in one place. 

Data preparation. Hadoop excels at data enrichment, transformation and vectorization prior to being scored for fraud. It enables heuristic matching algorithm required to prepare certain types of data and integrates with familiar ETL tools while Hadoop handles the heavy data collection, transformation and preparation. 

Fraud scoring. Access to a variety of predictive models improves the accuracy of fraud models. Hadoop’s support for multiple frameworks can bring multiple computational techniques to bear on the AML problem including static rules engines, state machines, graph algorithms, natural language processing and machine learning. 

Model development. Criminal methods evolve to evade detection, requiring predictive models to be improved over time. While some models are relatively static, others use techniques like linear regression and clustering, which require training from a historical data set. Interactive query tools like Cloudera Search and Impala facilitate the discovery of new patterns and associations while the availability of more data and processing power in Hadoop allow models to incorporate more parameters, train on longer historical perspective and iterate more rapidly when back-testing new variations. 

Investigation. Improving model accuracy to eliminate false positives, thereby reducing the time- and resource-intensive caseload for the human element of investigation, is a major way Hadoop decreases the cost of AML. As part of an enterprise data hub, ad hoc interactive query reduces the burden of investigation by providing fast answers to arbitrary questions over large data sets. 

As part of an enterprise data hub, Hadoop’s flexibility, scalability and affordability extend existing investments in dedicated fraud-detection solutions by increasing the volume, age and variety of data that can be examined while speeding up data transformation for faster time to insight. Once such massive data is consolidated, Hadoop can increasingly take on more advanced AML workloads such as entity matching while Cloudera Search and Impala remove the complexity of model development, process automation and case investigation. 

Ryan Goldman is a director of product marketing at Cloudera, focusing on vertical solutions and monetization services. He spent three years as an enterprise marketing strategist and Asia Region marketing lead at Cisco Systems. He started in technology marketing after seven years working in international development in Washington, D.C., focusing on microfinance in Central Asia, social entrepreneurship in South America, and education policy in sub-Saharan Africa.