The Internet is an ocean of content, swirling with documents, news, blogs, buzz, speculation and rumor. And hedge funds and proprietary trading firms increasingly recognize the value in mining this web content to predict trading opportunities. But how does a firm harness the web's flood of unstructured data?
Recently, a host of firms, including start-ups as well as established media giants, have been addressing the Big Data challenge, offering tools and services that mine Internet data and provide Wall Street with sentiment analysis, for example. One player that is gaining traction in the financial industry is Recorded Future, a Cambridge, Mass.-based company that sifts through and organizes vast stores of publicly available data on the web -- such as earnings call announcements, government filings, product releases, blogs and social media interactions -- to uncover patterns and relationships that help predict the future of the markets.
Founded in 2009, Recorded Future scans about 300,000 web documents per hour from about 40,000 sources, turning this into millions and millions of data points per day, according to CEO and cofounder Christopher Ahlberg, a Swedish-born computer scientist who invented in 1996 Spotfire, an independent business intelligence and visualization tool that he sold to Tibco in 2007 for $195 million in cash. The data is then stored in a gigantic database hosted in the Amazon AWS cloud. "We realized that we could build a very comprehensive events database around events that happened in the past, events that are happening now and in the future, and make that available with analytics around that to intelligence analysts and investment banks," Ahlberg says of the Recorded Future value proposition. "If you are Goldman Sachs and you like to do this sort of thing, that's very hard to do."
Recorded Future makes this data available to companies for back-testing via a real-time web API. "They take our data and try to use it in trading," relates Ahlberg, who says the firm's main customers are hedge funds, investment banks and intelligence agencies. (In fact, in addition to funding from Google Ventures, IA Ventures and other venture capitalists, Recorded Future received funding from the CIA's intelligence arm, Incutel, to use the technology to predict acts of terrorism.)
For example, Recorded Future ranks companies that are in the S&P 500 and the Russell 3000 based on sentiment analysis -- that is, how positively or negatively the firms are referred to in the day's press as well as price momentum. According to Ahlberg, a trading strategy that went long on the S&P 500 companies ranking in the top decile and short on the firms in the lowest decile would have returned 12.1 percent gains over the past six months while the market was down 19 percent.
Surveying the Internet Is a Big Job
According to Chris Malloy, an associate professor in the finance department at Harvard Business School who has written a case study on Recorded Future, sentiment analysis tools can be useful for investment strategies. "They're trying to help investment managers design trading signals by giving them feeds or by creating these measures of sentiment or measures of momentum," he explains. "Some of the big hedge funds can do this already because they have hundreds of programmers. [But] this is a low-cost way that anyone can get access to really exhaustive data scraped from the web."
Recorded Future users, Malloy relates, can click on a category -- such as company, person, earnings or management disclosure, or new product releases -- and the platform will plot the time series of an event. "That's the baseline," he says. "On top of that, the text of a discussion is analyzed to determine whether the sentiment is positive or negative, and the momentum score is based on the amount of interest in a topic out there." And unlike Google searches, whose results don't understand the time relevance of content beyond posting dates, Malloy adds, Recorded Future looks for words and phrases suggesting the future -- such as "next quarter," "next year" and "2012," for example -- to identify content to help firms predict events.
But the job of analyzing web content is huge, and there are other companies in the space as well, Malloy points out. "What these companies are doing is exhaustive," he says. "They're trying to source anything out there."
Titan Trading Analytics, a quantitative trading platform, uses proprietary analytics on market data and machine-readable news, as well as social media sentiment data from MarketPsych, a firm specializing in behavioral finance, to produce buy and sell signals. "They are scraping all the social media, such as Twitter, blogs and Yahoo Finance," John Coulter, Titan's CEO, says of MarketPsych.
Delivering More Than the News
Meanwhile, big global news organizations such as Dow Jones and Thomson Reuters also provide sentiment analysis on stocks for use in automated trading. But the next step, according to Rich Brown, global business manager of machine readable news at Thomson Reuters, is to apply the analytics to a broader set of web content.
At the end of November, Thomson Reuters expects to launch a feed handler for the Thomson Reuters News Analytics system that would plug the Internet into the company's news engine and analyze whatever content was selected, including blogs, social media and news sites, according to Brown. "It's a capital markets play for financial services, very similar to the target audience for our current system," he says.
To pull this off, Thomson Reuters is working with an Internet aggregator whose analytics engine monitors 3 million blogs and 40,000 websites, reports Brown, who declines to name the provider prior to the service's official announcement. "The problem with that," he acknowledges, "is information overload, so you'd want to take that bucket of sources and filter that for the ones that you think are more relevant" to financial services.
In incorporating unstructured web content in their strategies, some firms specifically are focusing on Twitter, notes Harvard's Malloy, pointing to Derwent Capital, the London-based hedge fund started by a professor at the University of Indiana that uses Twitter sentiment to make investment decisions. According to media reports in August, Derwent Capital beat the market -- and other hedge funds -- in its first full month of trading.
While the existing Thomson Reuters News Analytics product digests the Reuters news feeds and about 50 other third-party services, the new web analytics service will focus on analyzing the Internet to come up with sentiment and contextual information on companies, according to Thomson Reuters' Brown. But while social media is one of the inputs into the new service, Thomson Reuters is not focusing on Twitter alone, Brown reports. "It's hard in 140 characters to get enough context to [measure sentiment] in single tweets," he says.
Rather, Thomson Reuters is looking at a wide variety of web content, including blogs, because, Brown explains, there's more context around what the author is talking about. "If you pick the right information or text sources, then you can find patterns happening in social media, so it allows you to expand significantly the amount of text" you leverage to generate sentiment analysis, he says. "The play is not filtering; its understanding what's being said."
Though some Wall Street firms are said to be feeding sentiment data directly into algorithms to generate trades based on the signals, Brown insists that high-speed trading is just one use for the data. "News Analytics and automated trading are not only for high-frequency, black-box trading," he says.
Speed Isn't the Only Game in Town
"This is for the ability of humans to make sense of what's going on at a massive scale," Brown continues. A trader or investment manager can create a benchmark of sentiment in the tech sector overall, he illustrates. Or a firm could look at worldwide sentiment data to determine a global asset allocation strategy.
Further, real-time sentiment data is a misnomer, Brown notes, since even on the Internet there are delays in disseminating information. And people are not necessarily trading on every posted item, he adds. With a company such as IBM, for instance, there could be millions of items posted a day, so the investor or analyst actually is weighing the sum of the day's posts.
In terms of delivering the sentiment data to end users, Thomson Reuters offers a hosted model, or the service can be deployed at the customer's site and published through the company's real-time enterprise market data system, according to Brown. "You can sip from the firehouse or ingest the whole thing," he says. "Or humans can send the data output to a visualization tool and track the sentiment or price trends over time."
But while companies such as Thomson Reuters are sourcing and analyzing exhaustive amounts of data, can an investment manager actually harness it in a way that will be useful, poses Harvard's Malloy. Providing that ability, he implies, could be the challenge for Big Data players -- and could mean success for investors. "Any kind of edge you have over anyone else," Malloy says, "is potentially worth a lot of money."