Data Management

01:44 PM
50%
50%

Hadoop 2.0: The Capital Markets Dragon Slayer?

Hadoop 2.0 might help capital markets firms catch up with other industries when it comes to big data adoption.

While industries such as healthcare, e-commerce and even retail financial Services are being rapidly transformed by the technology ecosystem of "big data," leading capital markets firms have been in the uncharacteristic position of lagging the innovation curve.

Jennifer L. Costley, Ashokan Advisors
Jennifer L. Costley, Ashokan Advisors

Capital Markets technologists give many reasons for the lack of viable implementations: data sets that are not sufficiently "big," difficulty integrating another data solution into an already crowded portfolio, and even a desire to keep their activities confidential to preserve their market advantage. But the biggest reason for the failure of these firms to aggressively engage with big data has been limitations in the architecture of the core big data technology.

Apache Hadoop has been optimized for Map/Reduce, an offline ("batch") processing architecture that does not readily support use cases demanding real-time or near real-time results, such as Value-at-Risk (VaR) and algorithmic trading. Although other options beyond Map/Reduce are available, they are layered onto a core technology in which Hadoop closely couples resource management with data processing.

[For more on how Wall Street organizations are approaching big data challenges, read: Financial Firms Adopt Big Data As Defense Against Cyber Threats.]

As a result, it is not possible to run multiple applications simultaneously (e.g., SQL, in-memory analytics, and Map/Reduce) with any level of control over the prioritization of these applications, and hence a guarantee of timely results.

Spinning The YARN

That is, until now. The introduction of Hadoop 2.0 and, in particular, the new YARN resource manager component promises to provide the capital markets with a loosely-coupled architecture that will allow all of the necessary processing modes -- batch, interactive, online and streaming -- to run simultaneously on Hadoop with defined quality of service via resource allocation.

To understand how YARN works, it is important to first understand the current Hadoop architecture. The current Hadoop Map/Reduce System is composed of the JobTracker, which is the master scheduler, and TaskTrackers associated with each of the nodes (instances) of the application. The JobTracker is responsible for all resource management tasks including managing the TaskTrackers, tracking resource consumption/availability, and job life-cycle management. JobTracker views the cluster as composed of nodes managed by individual TaskTrackers with distinct "map" slots and "reduce" slots. These slots are cannot be reassigned. It is this locked-in hierarchy that prevents the optimal execution of non-Map/Reduce workloads on Hadoop.

YARN breaks this limitation by splitting up the two major responsibilities of the JobTracker, resource management and job scheduling/monitoring, into separate functions -- a global ResourceManager and per-application ApplicationMaster. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system. The ApplicationMaster negotiates resources from the ResourceManager and works with the NodeManager(s) to execute and monitor the component tasks.

This separation of responsibilities allows the arbitration of resources among the competing applications and provides the flexibility necessary for more optimal resource management and service level guarantees. This in turn should allow more business-driven capital markets use cases to be realized in the big data paradigm.

Whether or not Hadoop 2.0 proves to be the ultimate big data roadblock "dragon slayer" for capital markets, it will surely be a major breakthrough in eliminating some of the most critical obstacles to success.

About The Author: Jennifer L. Costley, Ph.D. is a scientifically-trained technologist with broad multidisciplinary experience in enterprise architecture, software development, line management and infrastructure operations, primarily (although not exclusively) in capital markets. She is also a non-profit board leader recognized for talent in building strong governance and process. Her current focus is in helping companies, organizations and individuals with opportunities related to data, analysis and sustainability. She can be reached at www.ashokanadvisors.com.

Jennifer L. Costley, Ph.D. is a scientifically-trained technologist with broad multidisciplinary experience in enterprise architecture, software development, line management and infrastructure operations, primarily (although not exclusively) in capital markets. She is also a ... View Full Bio
Comment  | 
Print  | 
More Insights
More Commentary
Moving the Trader Closer to the Investment Process
The sell side can demonstrate more value by applying analytics to pre- and post-trading, and by educating buy-side clients about broker segmentation, trading behavior and algorithm shortcomings, and more.
Wirehouses May See More Independent BDs as Retention Packages Expire
Retention bonuses are expiring, leaving brokerages vulnerable to attrition. Is access to technology making it easier for brokers to go independent?
SCI: A Whale of a Regulation
The SEC's Reg SCI weights in at a whopping 742 pages. Here is what you need to know about the oversized regulation.
One Size Fits Nobody in End User Services
How building profiles from employees' roles and behaviors can help optimize your end user services.
'Enlightened' Non-IT Execs More Likely To Run Secure Organization
Do senior executives understand their role in data security? On the whole, unsurprisingly, no.
Register for Wall Street & Technology Newsletters
White Papers
Current Issue
Wall Street & Technology - Elite 8, October 2014
The in-depth profiles of this year's Elite 8 honorees focus on leadership, talent recruitment, big data, analytics, mobile, and more.
Video
5 Things to Look For Before Accepting Terms & Conditions
5 Things to Look For Before Accepting Terms & Conditions
Is your corporate data at risk? Before uploading sensitive information to cloud services be sure to review these terms.