Wall Street & Technology: Blog
subscribe June 22, 2007

IBM Previews Ultra-Powerful Stream Processing System

For the past four years, a team of 70 engineers in IBM’s T. J. Watson Research Center in Hawthorne, N.Y. has been working on an ultra-powerful, large-scale (as in petabytes of data) stream processing system, currently running on 800 x86 computers with embedded Cell processors, that can analyze in real time massive volumes of market data and news (as well as medical, seismic, astrological or any other type of data). At the SIFMA show this week, IBM talked about this “mature prototype,” called System S, which Wall Street firms will one day be able to use to create a no-holds-barred environment in which their quants can roam free, testing ideas, finding correlations and refining algorithms, using a huge pipeline of streaming data and seeing instant results. A government agency is already using System S and IBM has filed 400 patents for it. The reason IBM talked this project up at SIFMA is because it's interested in working with capital markets firms on pilots to see what System S could do for them.

“Some of our most sophisticated clients on Wall Street are jumping all over this,” says Kevin Pleiter, director of financial services at IBM. “The power of this is it’s able to correlate events from disparate data sources.” For instance, a market data event such as a plunge in the price of certain stocks might trigger an algorithmic trading program to buy some of the stock. But if the price drop were caused by a calamity such as a terrorist attack, such a purchase would be unwise. Simultaneously as it’s watching the market data, System S could be taking in video feed from television networks, analyzing the news, and sending a recommendation to put the trading system into crisis mode.

In another example, a buy-side firm looking at a company could correlate its road show, analyst call, and fundamentals such as earnings per share with other data sources. “If the CEO is saying that orders are strong, imagine being able to correlate that with satellite imagery that tells you whether or not the parking lot is full and whether or not trucks are going to and from the distribution facility,” says Pleiter. “If the parking lot is half empty, the system would recognize that this guy is trying to talk his stock up.”

Pleiter acknowledges that several complex event processing products exist on the market today. “But today it’s an environment where people are taking in structured data, putting it into a fixed format, and events are triggered off that stream,” he says. “System S takes this four generations forward, to a highly distributed, highly scalable stream processing technology that can take in any type of structured or unstructured data without requiring reformatting and allowing anything to be done with it.” For instance, video streams from CNN, Al Jazeera, and BBC News could be analyzed alongside market data feeds from Reuters, Thomson and Bloomberg as well as archived phone calls, emails, HTML pages, research reports, purchase orders, invoices, satellite images and more. System S is said to have parsers and semantic annotation to help analyze each of these streams.

IBM already had many of the pieces required to do this. It’s had an enterprise platform for managing structured and unstructured information together for several years, and a couple of years ago it introduced an architecture for mixing and matching various types of search and text analytics technology (this is called UIMA and works with most of the best-in-class search products). It has video parsing and searching. It has speech recognition software.

System S contains a brand-new technology layer that Pleiter describes as “an artificial intelligence-like scheduling technology. This is intelligent scheduling, looking at the information streams and steering the hardware when major changes occur, because something important must have happened,” he says.

In the Watson lab, the System S computers are connected with 20 gigabit InfiniBand, but researchers are playing with optical switches and optical networking, aiming for a super-fast 100 GB network.

The System S user environment comes in three versions. There’s a simplistic user interface that lets users query the system the way they would a database, using predefined SQL calls. There’s an intermediate interface that's similar to using Excel macros. For power users, there’s an Eclipse-based development environment for writing custom applications.

The system is meant to be flexible in its use of hardware. “The conceptual picture is that the design of the software control programs is specifically intended to allow aggregation and exploitation of all kinds of hardware,” says Nagui Halim, director of high performance stream processing at IBM. “My thinking was customers get big installations of hardware, they make changes, they buy specialized accelerators. We allow the segmentation of specialized functions to accelerators.” So far the Watson lab is using embedded Cell processors on IBM BladeCenters running Linux and they’re testing FPGAs. System S can run on a tiny system such as a laptop and scale up to a 100,000 node cluster.

Posted by Penny Crosman at 03:30 PM



This is a public forum. CMP Media and its affiliates are not responsible for and do not control what is posted herein. CMP Media makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.

Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of CMP Media LLC and may be edited and republished in print or electronic format as outlined in CMP Media's Terms of Service.

Important Note: This comment area is NOT intended for commercial messages or solicitations of business.


CHECK THIS OUT

Make your organization more efficient and customer focused. Visit the Transaction Lifecycle Management Site today!


Featured White Paper
Grupo Santander Uses TLM Reconciliations to Reduce Operational Risk, Boost Efficiency

Events

Live Events:
Advanced Trading's Buy-Side Trading Summit
November 15 - 17, 2009


Marketplace

Career Center


Ready to take that job and shove it?

Function:
Information Technology
Engineering
State:


Keyword(s):

Browse By:
State | City

Techweb
Informationweek Business Technology Network
InformationweekInformationweek 500Informationweek 500 ConferenceInformationweek AnalyticsInformationweek Events
Informationweek MagazineGlobal CIOIWK Government ITbMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingPlug Into The CloudDr. DobbsContentinople
space
TechWeb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0Mobile Business ExpoNoJitter
Black HatGTECEnergy CampCloud ConnectEnterprise Cloud SummitCloud Summit ExecutiveGov 2.0 ExpoGov 2.0 Summit
space
Light Reading Communications Network
Light ReadingLight Reading AsiaUnstrungCable Digital NewsInternet EvolutionPyramid Research
Heavy ReadingLight Reading LiveLight Reading InsiderEthrnet ExpoTelco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems and TechnologyInsurance and TechnologyWall Street and TechnologyAccelerating WallstreetBST SummitBuyside Trading SummitIT Summit
space
Microsoft Technology Network
MSDNTechNetTotal IT ProTotal Dev ProTotal IT Pro CommunityTotal Dev Pro Community
space