Wall Street is trying its best to harness the 1.2 trillion gigabytes of data currently floating around in cyberspace. But even as the industry has made strides in implementing big data technologies, such as the Hadoop framework, the amount of data available appears to be spiraling out of control, with estimates of 800 percent growth over the next five years. So how can firms meet the big data challenge?
The quest by Wall Street to efficiently tap big data typically is driven by the search for alpha. "Everyone who works as a quant wants massive amounts of data in different areas to see if you can play it out," says Rob Passarella, vice president at Dow Jones Financial Markets. "The cheaper and more distributed [the data] becomes, the greater the returns will be."
Financial firms today are actively trying to extract meaning from an explosion of structured and unstructured data -- statistical data, social media streams and other web content, smartphone data, videos, PDF files, Excel files -- that is not easily digested and deciphered. "We used to think of data sets as a single vector. Now we're thinking of multiple types of data and trying to see if there are any correlations," adds Passarella, who previously worked as a managing director at Bearn Stearns and as a VP at JPMorgan Chase. "The real challenge when you accumulate data is being able to line it up so that it's meaningful. So if you have equity price and bond price data, you want to line it up and layer social media or news data on top of that."
As banks and hedge funds dive deeper into the analysis of social media, PatternBuilders -- a provider of enterprise analytics solutions whose tag line is, "Too much data, not enough information" -- is working with the University of Sydney to research the influence of traditional media sources such as The New York Times and social media such as Twitter on a company's stock price. "If the Wall Street Journal publishes a negative article on IBM, for example," PatternBuilders' CEO Terence Craig ponders, "what is the impact on pricing and volume of that article when it's been magnified by social media?"
Researchers working on the project also are looking at the effect the location of Twitter users can have on real-time stock prices -- for instance, whether Twitter users based in New York who retweet a specific article have a more dramatic impact on stock pricing than a user in Texas. "We can take data from social media as a string from our social media partner and a news article from a partner and produce predictions about their impact on pricing very quickly," Craig says. "Traditionally these were large batch jobs, and you usually got answers in two hours. Now you get these answers in real time."
Because it's in real time, he adds, "It's a very efficient way for people to make the decision as to whether a human needs to intervene." According to Craig, lately, "grey box" traders -- who use a system that reveals some or all of the decision-making process, as opposed to an automated black-box system that conceals the trade decision process -- have been particularly interested in PatternBuilders' product, which he describes as "a streaming analytics processor implemented in software with a visualization engine on top."
Big Data Opportunities
But the potential of big data extends far beyond the trading floor. Today, banks and hedge funds also are analyzing big data for risk management, price discovery, industry trend analysis and fraud management, Craig notes.
JPMorgan Chase, for example, is using an operational database from MarkLogic to store and process derivatives contracts. "Derivatives get entered and, on the back end, put in MarkLogic's system and processed and matched," explains David Gorbet, VP of product strategy at MarkLogic, whose clients also include Morgan Stanley and Citi. "They have been able to replace relational databases with one system and add new business process steps to address new compliance issues."
[Big Data Payday: Rosenblatt Visualizes Success With a Proprietary Big Data Platform.]
Other big-data applications in the financial industry include customer onboarding and relationship management, as firms look to gain a 360-degree view of their clients and understand cross-selling opportunities, adds Gorbet. "People want to do analytics, but analytics is only half the story," he says. "They want to build it in processes. So the trend is around applications and the ability to build in real time."
Of course, one of the biggest drivers behind the need to come to grips with big data is increased and intensifying regulation and the need to provide granular reporting to regulators. "The variety of risk profiling and stress testing that financial institutions will be subjected to requires more analytical capabilities," says Peter Ognibene, managing director at Berkery Noyes, an independent investment bank that provides mergers and acquisitions consulting services.
"Organizations are scrambling to cope with the changing regulatory landscape," adds S. Ramakrishnan, group VP and general manager, financial services analytical applications, Oracle. "It requires nimbleness on the part of institutions and the need to unify data in one repository."
From a risk and regulatory standpoint, firms need to understand the data within the organization, says Larry Tabb, founder and CEO of Tabb Group. But most firms are still struggling with the challenge of integrating data. "Many large banks have tremendous silos of information that even in 2012 are hard to integrate, and especially given the requirements from Dodd-Frank, there will be increasing demand to understand the information, aggregate, analyze and report it," Tabb says.
Data aggregation is more than a technology challenge. Aggregation requires finding the data in multiple systems, bringing it together, normalizing it and storing it in one place, explains Alberto Corvo, managing principal, financial services, at eclerx, which provides business process outsourcing services. "A lot of our clients think normalizing data is a technical job," he says. "But you also need brains and arms to make sure the data is correct."