Data Management

11:00 AM
Bill Kantor, Paradigm4
Bill Kantor, Paradigm4
Commentary
50%
50%

Current Analytical Software Constrains Big Data Solutions in the Financial Industry

What is needed is a new software paradigm that enables analytics to just work and work fast—without having to move the data or worry about size, writes Bill Kantor of Paradigm4.

Many organizations have realized significant competitive advantages with complex analytics. But Big Data is stressing the limits of these techniques for both data access and analysis. When the database is distributed over multiple servers, certain types of queries become too slow. When data is too big to fit in memory on one server, then complex math operations fail. Yes, some access and analytics are “embarrassingly” parallel and for these tasks, organizations can easily sidestep these problems. But many of these problems are not easily scaled with this approach. Moreover, many organizations separate data management from analytics forcing them to move massive data from one software package to another. If data is big, it shouldn't be moved.

Bill Kantor, Paradigm4
Bill Kantor, Paradigm4

These challenges have driven several workarounds: like working with subsets of data, buying expensive servers or appliances, replicating alternative views of data to support faster retrieval, or developing custom software that explicitly manages data distribution and parallel computation. Each of these has their problems as they slow down and drive up the cost the data-driven discovery. What is needed is a new software paradigm that enables analytics to just work and work fast—without having to move the data or worry about size.

The financial industry in particular experiences these challenges because it depends on complex analytics—particularly matrix math, multidimensional selects, and moving window aggregates—which most Big Data architectures cannot accommodate readily. Here’s why:

• Extract Transform Load (ETL) gets in the way of interactive, exploratory analytics. Analytics solutions that separate the storage engine from the analytics engine are impractical for Big Data because they force you to move your data and transform it into the analytical package’s format. ETL tools are great in lessening the pain to move the data, but these tools do not address the fundamental issue—separating data management from the math slows analysts down. Interactive, exploratory, “big math” ought to be painless.

• In-memory solutions don’t scale for complex analytics. Big Data datasets exceed a single machine’s memory. Although some “embarrassingly parallel” problems decompose into multiple smaller independent problems that can be distributed across a cluster, many complex analyses needed by financial institutions don’t. Subsetting data produces less accurate models. Even if your data does fit on one machine, performance is limited by the number of cores you have. Analytics ought to scale past limitations of a machine’s memory or number of cores—up to as much computing power as you have available.

• Hadoop doesn’t do complex math well. Hadoop, SQL-on-HDFS, and databases with embedded MapReduce are challenged by complex analytics that are not embarrassingly parallel. For these problems, such architectures can require a lot of low-level coding, turning data scientists into computer scientists. Big Data chores ought to be invisible and automatic.

• Quant-friendly languages are demoted. Typical Big Data solutions don’t let quants and data scientists develop analytical solutions in languages they prefer like R and Python. Analytics solutions should promote collaboration and capitalize on contemporary programing languages and analytical tools.

Analysts want to explore data regardless of its size, iterate rapidly to build models using complex analytic approaches and based on all available data, and deploy them. Ask your Big Data vendors if their infrastructure supports these objectives. Here are some awesome things you should be able to do with a Big Data exploratory analytics database.

1. Build the Arca NBBO for one day of all exchange-traded US equities (186 million quotes) in 80 seconds on a four node (eight cores per node) commodity hardware cluster. Run it in about half the time on a cluster twice as large.

2. Use Principal Components Analysis (PCA) to analyse variance among asset classes and the individual securities within those asset classes. With an array database, it’s possible to run a PCA on a 50M x 50M sparse matrix with 4B non-zero elements in minutes.

3. Select data sets (based on complex criteria) in constant time—irrespective of how big your dataset gets.

4.Perform window operations in parallel on distributed data; express these operations easily and not worry about the programing details to parallelize the work. (A simple and common example of this is to calculate volume-weighted average price, which for many databases will only work if your data contained by your window is all on one machine.)

Computationally intensive matrix math algorithms underpin many pricing, arbitrage and risk calculations used in computational finance. What is needed is a scalable database with native complex analytics, integrated with R and Python. With this infrastructure financial institutions can rapidly implement proprietary algorithms at Big Data scale.

—By Bill Kantor VP Sales and Marketing at Paradigm4

Comment  | 
Print  | 
More Insights
More Commentary
So Much Data, So Little Time
How fast does data need to be used in order to be beneficial?
Understanding the Value of Big Data
Winning investors are embracing new technologies to make better allocation decisions.
Catch Me if You Can: Risk Hidden in Plain Sight
The digital revolution hasn't yet reached all four corners of the enterprise. Paper-based data and manual workflows are hotspots for risk and are ripe for modernization.
The Ripening Case for Managed Services in Risk Management
Risk management will not escape the trend towards managed service solutions. Managers must prepare to take advantage of the emerging technologies and vendor relationships.
Vigilante Justice on the Digital Frontier
In the face of slow government action, Wild West-style justice has moved to the digital realm, and private organizations risk being caught in the crossfire.
Register for Wall Street & Technology Newsletters
White Papers
Current Issue
Wall Street & Technology - July 2014
In addition to regular audits, the SEC will start to scrutinize the cyber-security preparedness of market participants.
Video
5 Things to Look For Before Accepting Terms & Conditions
5 Things to Look For Before Accepting Terms & Conditions
Is your corporate data at risk? Before uploading sensitive information to cloud services be sure to review these terms.