Data Management

01:44 PM
50%
50%

Hadoop 2.0: The Capital Markets Dragon Slayer?

Hadoop 2.0 might help capital markets firms catch up with other industries when it comes to big data adoption.

While industries such as healthcare, e-commerce and even retail financial Services are being rapidly transformed by the technology ecosystem of "big data," leading capital markets firms have been in the uncharacteristic position of lagging the innovation curve.

Jennifer L. Costley, Ashokan Advisors
Jennifer L. Costley, Ashokan Advisors

Capital Markets technologists give many reasons for the lack of viable implementations: data sets that are not sufficiently "big," difficulty integrating another data solution into an already crowded portfolio, and even a desire to keep their activities confidential to preserve their market advantage. But the biggest reason for the failure of these firms to aggressively engage with big data has been limitations in the architecture of the core big data technology.

Apache Hadoop has been optimized for Map/Reduce, an offline ("batch") processing architecture that does not readily support use cases demanding real-time or near real-time results, such as Value-at-Risk (VaR) and algorithmic trading. Although other options beyond Map/Reduce are available, they are layered onto a core technology in which Hadoop closely couples resource management with data processing.

[For more on how Wall Street organizations are approaching big data challenges, read: Financial Firms Adopt Big Data As Defense Against Cyber Threats.]

As a result, it is not possible to run multiple applications simultaneously (e.g., SQL, in-memory analytics, and Map/Reduce) with any level of control over the prioritization of these applications, and hence a guarantee of timely results.

Spinning The YARN

That is, until now. The introduction of Hadoop 2.0 and, in particular, the new YARN resource manager component promises to provide the capital markets with a loosely-coupled architecture that will allow all of the necessary processing modes -- batch, interactive, online and streaming -- to run simultaneously on Hadoop with defined quality of service via resource allocation.

To understand how YARN works, it is important to first understand the current Hadoop architecture. The current Hadoop Map/Reduce System is composed of the JobTracker, which is the master scheduler, and TaskTrackers associated with each of the nodes (instances) of the application. The JobTracker is responsible for all resource management tasks including managing the TaskTrackers, tracking resource consumption/availability, and job life-cycle management. JobTracker views the cluster as composed of nodes managed by individual TaskTrackers with distinct "map" slots and "reduce" slots. These slots are cannot be reassigned. It is this locked-in hierarchy that prevents the optimal execution of non-Map/Reduce workloads on Hadoop.

YARN breaks this limitation by splitting up the two major responsibilities of the JobTracker, resource management and job scheduling/monitoring, into separate functions -- a global ResourceManager and per-application ApplicationMaster. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system. The ApplicationMaster negotiates resources from the ResourceManager and works with the NodeManager(s) to execute and monitor the component tasks.

This separation of responsibilities allows the arbitration of resources among the competing applications and provides the flexibility necessary for more optimal resource management and service level guarantees. This in turn should allow more business-driven capital markets use cases to be realized in the big data paradigm.

Whether or not Hadoop 2.0 proves to be the ultimate big data roadblock "dragon slayer" for capital markets, it will surely be a major breakthrough in eliminating some of the most critical obstacles to success.

About The Author: Jennifer L. Costley, Ph.D. is a scientifically-trained technologist with broad multidisciplinary experience in enterprise architecture, software development, line management and infrastructure operations, primarily (although not exclusively) in capital markets. She is also a non-profit board leader recognized for talent in building strong governance and process. Her current focus is in helping companies, organizations and individuals with opportunities related to data, analysis and sustainability. She can be reached at www.ashokanadvisors.com.

Jennifer L. Costley, Ph.D. is a scientifically-trained technologist with broad multidisciplinary experience in enterprise architecture, software development, line management and infrastructure operations, primarily (although not exclusively) in capital markets. She is also a ... View Full Bio
More Commentary
A Wild Ride Comes to an End
Covering the financial services technology space for the past 15 years has been a thrilling ride with many ups as downs.
The End of an Era: Farewell to an Icon
After more than two decades of writing for Wall Street & Technology, I am leaving the media brand. It's time to reflect on our mutual history and the road ahead.
Beyond Bitcoin: Why Counterparty Has Won Support From Overstock's Chairman
The combined excitement over the currency and the Blockchain has kept the market capitalization above $4 billion for more than a year. This has attracted both imitators and innovators.
Asset Managers Set Sights on Defragmenting Back-Office Data
Defragmenting back-office data and technology will be a top focus for asset managers in 2015.
4 Mobile Security Predictions for 2015
As we look ahead, mobility is the perfect breeding ground for attacks in 2015.
Register for Wall Street & Technology Newsletters
Video
5 Things to Look For Before Accepting Terms & Conditions
5 Things to Look For Before Accepting Terms & Conditions
Is your corporate data at risk? Before uploading sensitive information to cloud services be sure to review these terms.