11:24 AM
Connect Directly

Fraud Management In The Age Of Big Data

Big data can help financial firms uncover fraud, sometimes even before it happens.

Reference Architecture

The key technology components of a reference architecture include:

1. Information sources are depicted the left. These encompass a variety of machine and human actors either transmitting potentially thousands of real time messages per second.
2. A highly scalable messaging system to help bring these feeds into the architecture as well as normalize them and send them in for further processing.
3. A Complex Event Processing tier that can process these feeds at scale to understand relationships among them; where the relationships among these events are defined by business owners in a non technical or by developers in a technical language.
4. As a result of specific patterns being met that indicate potential fraud, business process workflows are created that follow a well defined process that is predefined and modeled by the business
5. Data that has business relevance and needs to be kept for offline or batch processing can be handled using a Java Data Grid and/or a storage platform. The idea to deploy Hadoop oriented workloads (MapReduce, or, Machine Learning) to understand fraud patterns as they occur over a period of time
6. Scaleout is preferred as a deployment approach as this helps the architecture scale linearly as the loads placed on the system increase over time

Here is a high-level depiction of the reference architecture:

Illustration 1: Reference Architecture for a Fraud Detection Application
View Larger
Illustration 1: Reference Architecture for a Fraud Detection Application

Messaging Broker Tier

The messaging broker tier is the first point of entry in a system. It fundamentally hosts a set of message queues. The broker tier needs to be highly scalable while supporting a variety of cross language clients and protocols from Java, C, C++, C#, Ruby, Perl, Python and PHP. Using various messaging patterns to support real-time messaging, this tier integrates application, endpoints and devices quickly and efficiently. The architecture of this tier needs to be flexible so as to allow it to be deployed in various configurations to connect to customized solutions at every endpoint, payment outlet, partner, or device.

Complex Event Processing Tier

The Complex Event Processing (CEP) portion of the implementation, in this scenario, is an independent software module, but still completely integrated with the rest of the platform while running on a horizontally scaled infrastructure. Typically, the CEP teir has the following capabilities:
  • understand and handle events as first class citizens of the platform
  • select a set of interesting events in a cloud or stream of events
  • detect the relevant relationships (patterns) among these events
  • take appropriate actions based on the patterns detected

CEP allows the architecture to process multiple events with the goal of identifying the meaningful ones. This process involves:

  • Detection of specific events
  • Correlation of multiple discrete events based on causality, event attributes, and timing
  • Abstraction into higher-level (i.e. complex or composite) events
  • It is this ability to detect, correlate and determine business relevance that powers a truly active decision-making capability.

    Business Rules and Process Management System (BRMS)

    The BPM tier is invoked for downstream handling as specific events are detected. BPM process and business rules can be defined by non technical as well as technical users of the fraud detection platform as shown in Illustration 2.

    The BPM tier essentially spins up new processes that can be entirely automated or can have a human-in-the-loop to process fraudlent events. A result of this process can be many things. For instance, one result might be a call to a customer by a call center representative. Another result could be an update to a datastore that can be queried by a business intelligence application.

    Illustration 2: CEP/BPM Layer for Fraud Detection Application
    View Larger
    Illustration 2: CEP/BPM Layer for Fraud Detection Application

    Storage Tier

    There are broad needs for two distinct data tiers that can be identified based on business requirements.

    1. Some data needs to be pulled in near realtime, accessed in a low latency pattern as well as have calculations performed on this data. The design principle here needs to be "Write Many and Read Many" with an ability to scale out tiers of servers.

    Java based in memory datagrids (IMDGs) are very suitable for this use case as they support a very high write rate. Data Grid (JDG) is a highly scalable and proven implementation of a distributed datagrid that gives users the ability to store, access, modify and transfer extremely large amounts of distributed data. Further, JDG offers a universal namespace for applications to pull in data from different sources for all the above functionality. A key advantage here is that datagrids can pool memory and can scaleout across a cluster of servers in a horizontal manner. Further, computation can be pushed into the tiers of servers running the datagrid as opposed to pulling data into the computation tier.

    As the data volumes increase in size, datagrids can scale linearly to accommodate them.The standard means of doing so is through techniques such as data distribution and replication. Replicas are nothing but copies of the same segment or piece of data that are stored across (aka distributed) a cluster of servers for purposes of fault tolerance and speedy access. Smart clients can retrieve data from a subset of servers by understanding the topology of the grid. This speeds up query performance for tools like business intelligence dashboards and web portals that serve the business community. Datagrids also provide support for policies that can be used to quiesce data that is no longer needed or is transient (i.e has passed a certain time window).

    2. The second data access pattern that needs to be supported is storage for data that is older. This is typically large scale historical data. The primary data access principle here is "Write Once, Read Many." This layer contains the immutable, constantly growing master dataset stored on a distributed file system like HDFS. Besides being a storage mechanism, the data stored in this layer can be formatted in a manner suitable for consumption from any tool within the Apache Hadoop ecosystem like Hive or Pig or Mahout.

    As Chief Architect of Red Hat's Financial Services Vertical, Vamsi Chemitiganti is responsible for driving Red Hat's technology vision from a client standpoint. The breadth of these areas range from Platform, Middleware, Storage to Big Data and Cloud (IaaS and PaaS). The ... View Full Bio
    2 of 2
    Comment  | 
    Print  | 
    More Insights
More Commentary
A Wild Ride Comes to an End
Covering the financial services technology space for the past 15 years has been a thrilling ride with many ups as downs.
The End of an Era: Farewell to an Icon
After more than two decades of writing for Wall Street & Technology, I am leaving the media brand. It's time to reflect on our mutual history and the road ahead.
Beyond Bitcoin: Why Counterparty Has Won Support From Overstock's Chairman
The combined excitement over the currency and the Blockchain has kept the market capitalization above $4 billion for more than a year. This has attracted both imitators and innovators.
Asset Managers Set Sights on Defragmenting Back-Office Data
Defragmenting back-office data and technology will be a top focus for asset managers in 2015.
4 Mobile Security Predictions for 2015
As we look ahead, mobility is the perfect breeding ground for attacks in 2015.
Register for Wall Street & Technology Newsletters
White Papers
Current Issue
Wall Street & Technology - Elite 8, October 2014
The in-depth profiles of this year's Elite 8 honorees focus on leadership, talent recruitment, big data, analytics, mobile, and more.