With slumping profits for high-frequency trading, Wall Street is turning to big data analytics tools to generate revenue and new trading ideas.
The shift to what is called "high-intelligence trading" is setting off a new arms race in capital markets. Financial firms are looking at big data flows and filtering them to pinpoint a subset of data they can then into a trading platform without adding too much latency.
The so-called race to zero latency is not over, but it has become so expensive that the last microsecond is out of reach for most trading firms, Terry Keene, CEO of the technology integration firm iSys, said at Wall Street & Technology's The Analytics Edge conference in June. "If you shave off five nanoseconds, someone else can be there with you" almost the next day. He sees firms losing interest in the pure latency race.
Citing a March 30 white paper by the Securities Technology Analysis Center, Keene noted that, out of 16 big data analysis projects going on at commercial and investment banks, only one was HFT related, and it dealt with social media processing. "The rest of them were not speed sensitive." They had to do with investment banking, counterparty risk, portfolio investments, and credit risk.
With both volume and volatility down, sell-side firms aren't making money from their trading businesses and are looking at other sources of data to help make decisions. "They're going to get new sources of revenue, and these sources are going to come from the backend," said Keene. Many firms are focusing on news feeds, sentiment analysis, and other forms of unstructured data to generate a signal they can feed into their algorithms. However, analysis of unstructured data will introduce latency into the equation. "You can't do analytics of any real form and plug that into a 40- to 50-microsecond trading cycle."
The question is how much latency is acceptable to a high-speed trading application. "In high-intelligence trading, the key thing is to identify value from a large data set, both structured and unstructured," said Ted Hruzd, vice president at Deutsche Bank who specializes in improving systems performance and testing with market data, and who is exploring big data analysis. Once the value is identified, a firm can rank events that will alter a real-time trading application, he said. This is where complex event processing (CEP) plays a large role in big data analysis. "CEP lets firms write some rules to rank the events and throttle the events, which is critical because there is a demand to trade in real-time and for real-time decision making in trading."
Tune the infrastructure
To avoid slowing down the infrastructure, Hruzd suggested it's best for the CEP engine to reside in the "address space" of a firm's trading application with its own thread pool. Then it can send events in a "memory-to-memory transfer of the trading application."
This type of analysis and transfer can be done in less than 200 microseconds -- the time a trading application will take to analyze an event and act upon it. "My rule of thumb has been to keep your extraneous events under 1% of total processing. If you have a trading app running on six cores in 200 microseconds, that means you are generating 60 events per second. That is three-hundredths of 1% of the six cores." In an extreme scenario, this can go as high as 1,900 events per second. If firms stay under that threshold, "your impact on the trading application is minimal, and your potential return on that is exponential."
However, firms will be "noncompetitive" if they are working with historical data and unstructured data, unless rules are stored in memory. "If a rule is stored in memory, then when real-time events come in… you are ahead of the game, and you can alter your trading program right then and there," Hruzd said.
The next race: Hadoop to zero
But no one can work with big data without mentioning Hadoop, which ReadWrite describes as an open source platform for "storing enormous data sets across distributed clusters of servers and then running 'distributed' analysis applications in each cluster."
In capital markets, "everyone is looking to Hadoop as the answer to big data," Keene said. However, Hadoop is not fast at all. "It's a batch process, and it sits on disk. You need to get [Hadoop] in memory to have any impact."
Speeding up Hadoop has become an obsession for many players in capital markets. An open source program known as Spark, which uses clusters of off-the-shelf x86 processors, could be an alternative. Spark is "showing an improvement in performance of something like 100 times over Hadoop," John W. Verity writes on Storage Acceleration (a sister site). "It achieves that largely by keeping data in high-speed DRAM memory instead of on relatively slow disk drives." Clusters can be applied "in parallel to problems like figuring out what the masses are saying on social media about a particular brand, or how many times certain sets of words get used in a huge corpus of text."
As a result, Keene said, "we have had a race to zero with latency; we are about to have a race to zero with event processing." Factoring in the pre-processing of raw data, it will take the sell side about 200 to 300 milliseconds to get big data fed through memory into a trading algorithm with a Hadoop client, he said.
In order to analyze events for a trading application, a firm can turn to startups that can help speed up the infrastructure for big data analytics. If firms are going to do intelligent trading in near real-time with social media, they need to figure out what the crowd is doing, Keene said, and in-memory is a critical piece of processing news and sentiment analysis.
Hruzd would not name any startups that Deutsche Bank has identified (since that information is proprietary), but he did cite Guavus as one example of a startup that processes data in-memory. "They profess to have many adapters and noncommon sources of data."
Keene said iSys is working with ScaleOut, a company that's been in the in-memory data business for a long time. "They've basically taken the Hadoop programming model, put it in-memory, and built a query engine against it. They can take in real-time market data into a very fast database… and do analysis while the data is flowing into a large database."
And iSys has been doing a lot of proof-of-concept work with trading firms and with a bank in the UK to get flatter in-memory and to get more cores inside a box. Hruzd said he's looking at kernel-bypass technology, passing messages from application to application, which takes out interprocessing volumes.
If the trading industry is moving away from the pure speed race, where is it going to end up in a few years? Just as the prior arms race led brokers to move their algorithms into data centers to be closer to exchange-matching engines, big data analytics could lead to a new battle. If the sell side is going to get access to big data, it will need to do preprocessing and staging.
"Why wouldn't we move our algorithms closer to the big data and then move that big data next to the matching engine?" asked Hruzd. "This race doesn't end."
In fact, it sounds like the big data trading race is just beginning.Ivy is Editor-at-Large for Advanced Trading and Wall Street & Technology. Ivy is responsible for writing in-depth feature articles, daily blogs and news articles with a focus on automated trading in the capital markets. As an industry expert, Ivy has reported on a myriad ... View Full Bio