Data Quality is the Key to Artificial Intelligence in Compliance

Blog Post by Dan Fernandez on 25 October 2017 in category Holistic Surveillance

In my last blog "Is Artificial Intelligence (AI) Ready for Financial Compliance?" I wrote about how machine learning (ML) is paving the way for AI in financial compliance, the key differences between AI and ML, and why today’s financial compliance applications require a hybrid approach (that combines ML and human decision-making).

But for all the talk around AI and ML in financial compliance, the conversation really is a non-starter if we don’t address Data Quality first. Data quality is the foundation on which all automated, intelligent applications are built. Without quality data, there’s no basis for intelligent applications in financial compliance to even begin to understand what constitutes good (compliant) or bad (non-compliant) behavior.

Centralized Data Management

So how can financial firms ensure data quality? The first step is to make sure that data is centrally managed and accessible.

Unfortunately, this is typically not the case. With communications becoming more complex – traders are using multiple modalities such as voice, email, instant messaging, social media, mobile text, and chat to communicate – the resulting unstructured communications data is stored in silos. Complicating matters further, firms then use separate systems to conduct surveillance on voice and electronic communications, not to mention additional systems that are required to store and monitor the trade data that’s required for investigations.

System silos aren’t the only problem; there are departmental silos too. One group of people may be responsible for monitoring voice communications, another tasked with electronic communications surveillance, and so on.

This inherently makes the trade reconstruction process complex because it relies on different analysts to extract data from different systems, and then manually piece it all together.

Now imagine a holistic solution where all communications and trade records are centrally managed, readily accessible, organized with context, can be easily correlated and then mapped back to specific traders or trade IDs.

Other than the obvious benefit of having complete data, another advantage of this approach is that the data only needs to be structured, classified and aggregated once (in one system) as opposed to multiple times (in multiple systems). This in turn creates efficiencies and ensures consistency across data sources. It also lowers the up front and ongoing costs of managing data.

Let’s face it – a lot of work goes into making data useable. According to a recent survey of data scientists roughly 80% of the effort for any data project involves curating data so it’s in a uniform, useable format.

The Human / Data Quality Connection

Data is the ultimate fuel for artificially intelligent applications. But people have a reciprocal role in data quality too. Given the right tools, they can improve the quality of data, and in return, that data improves their decision-making too.

Take Google Maps for example. It comes as no surprise that Google Maps is one of the most highly rated map applications. According to an article in The New York Times, Google Maps can attribute this success to years and years of hand-tuning, manual effort, and data gathering via Street View cars, satellites and yes – even human labor! Until recently, ordinary people could submit corrections to Google’s maps via Google Map Maker as they encountered them. This continuous cycle of feedback and refinement made Google Maps the go-to source for one billion monthly users who rely on the application for reliable geo-data and accurate directions. By providing their feedback, Google Map users were rewarded with an even better customer experience.

But can this concept be extrapolated to financial compliance? Absolutely, even if on a smaller scale.

One of the essential ways a compliance analyst can contribute to data quality is by providing feedback in the form of defined data sets. These data sets are then used to enhance supervised machine learning models. In practical terms, the trade reconstructions that analysts create can be fed back into the system to help generate predictive models that provide a high level of confidence that a particular type of communication, or set of communications, might be risky or suspicious.

By being able to create a trade reconstruction holistically (the sequence of what a trader said and did, when, and how), over all communication modalities, and doing this repetitively, firms can begin to build predictive models of what to look for in the future and achieve detection earlier than ever before.

In a supervised machine learning environment, analysts can be presented with pre-packaged cases and using their expert knowledge, review and investigate them. In addition to automating the detection of potentially fraudulent activity, this relieves the analyst of manual tasks so he or she can focus on higher level decision-making.

Beyond Compliance: Data Quality Lays a Foundation for Competitive Advantage

Once firms have quality data, they can start to layer intelligent applications on top of it, and then begin to extract even more value from that data.

Consider this: in the future, the very same communications that analysts review to ensure regulatory compliance, could be used for business intelligence – thus transforming the compliance department from cost center to business driver.

This is not a new idea. Examples of data-driven success stories abound. Today, leading companies from every sector are using strategic data assets to measure the pulse of business, so they can continuously improve the products and services they offer.

Interactions with customers – whether via voice, text, chat or any other means – contain hidden insights into what customers need, like, and desire. Today, commercial contact centers use speech analytics as a way of mining these communications to unearth trends and customer satisfaction issues.

It’s not so far-fetched then to think that these very same concepts couldn’t be applied to trading communications and trade data, too, for example to ensure orders for trades made over the phone were actually placed, or to get an early read on trends (like what types of financial instruments are of most interest to customers and why). And that’s just the beginning.

With the help of sophisticated natural language processing applications, ML and (future) AI, today’s compliance managers can be tomorrow’s compliance pioneers.

Chris Jennings, vice president of technology services at Collaborative Consulting articulated it best in a recent InfoWorld article on data-driven companies. "To become a data-driven company the belief in the importance of the integrity and quality of information needs to permeate the culture of the company at all levels," he said. It is not enough to start a formal data governance program, becoming data-driven requires a disciplined shift in the mindset of all employees towards maintaining the integrity and quality of their data."

I couldn’t agree more.