March 09, 2009

Low Latency Special Report
GPUs Accelerate Options Analysis
MTFs Compete on Speed
Quant Fund Leverages StreamBase CEP
Wall Street's demand for high-performance and low-latency applications seemingly is insatiable. Nonetheless chip makers continue to do everything they can to provide financial services firms with as much computing power as possible.

AMD, for example, plans to offer its first six-core server processor, code-named "Istanbul," by the second half of 2009, and Intel, which already introduced a six-core processor, plans to launch the eight-core Nehalem EX server chip later this year. And faster servers, boasting 24 and even 100 cores, reportedly are on the chip manufacturers' drawing boards.

The catch is that as firms roll out the new servers, including the current quad-core servers, they won't see performance improvements for their legacy single-threaded programs -- unless they do some finessing. Fortunately the chip manufacturers -- as well as grid, virtualization and multi-thread software vendors -- all have jumped in to offer solutions to this problem, and Wall Street firms, including brokerage AVM, already have begun exploring these options.

The Problem

Older, single-threaded applications are designed to do all their work on one processor. When a single-threaded application is run on a quad-core server, the program still runs on just one processor, which will work very hard while the other three cores sit idle. And if two applications are run on separate processors on the same chip, sharing that chip's memory and other resources, bottlenecks can occur.

Yet multicore servers promise increased computing power and inevitably will make their way into every Wall Street data center within the next three years, observers say. With the right program design, according to experts, applications could potentially run eight times faster on an eight-core chip than on a single-core chip. This type of performance begins to seriously compete with the specialized hardware accelerators, such as graphics processing units and field-programmable gate arrays, with which Wall Street has been experimenting in its quest for high-speed, low-latency trading.

The challenge of writing code to run on multiple processors is not brand-new, as multi-CPU servers have existed for several years and the design challenge of spreading work across multiple CPUs is similar to that of spreading it across multiple processors. But the challenge persists because Wall Street is littered with aged yet still-useful programs and there's a dearth of multi-threading programming talent -- in other words, developers who can create elegantly designed programs that run concurrently across many cores are scarce.

In addition, program rewrites typically present a few challenges. One is that some older languages don't lend themselves to multi-threading. "C and C++ are comparatively ancient languages and concurrency isn't part of the language," explains Mike Dunne, CEO of Activ Financial, a Chicago-based market data solutions provider. "People who are developing in Java or C# have an advantage in that concurrency is natively part of the language." Another challenge is that, if a developer isn't careful, various threads can interfere with one another, for instance by competing for the same memory resources.

However, Dunne says, for all its challenges, the principles of multi-threading are straightforward. "One obvious way to think about it is 'divide and conquer' -- "Let's divide the problem into pieces and have a thread work on each piece,' " he says. "How you chop up a program is part of the art of computer programming."

Some Wall Street applications are "embarrassingly parallel" -- in other words, they are easily ported over to a multi-threaded environment. Monte Carlo simulations and options pricing engines are examples of programs that run calculations hundreds or thousands of times, and that work can be easily divided among processors.

For those who lack in-house development talent or who don't want their developers to be spending their time parallelizing, several software vendors, including RapidMind and Simtone, offer products that parallelize existing single-threaded apps to run on multicore processors.

Broker-Dealer 'Wraps and Adapts'

One alternative to the time-consuming task of rewriting application code for multi-threading is to use an intermediate layer, such as grid or virtualization software, between legacy applications and new chips. This is sometimes referred to as "wrap and adapt."

At Boca Raton, Fla.-based AVM, "We had a double problem surrounding the issue," relates Paul Algreen, the broker-dealer's chief technology officer. "The first was, we had old, legacy C and C++ applications as well as new .NET apps that had too many processes that were taking too long. We also wanted to throw more computational jobs into the mix and couldn't do it because there weren't enough hours in the day."

According to Algreen, he considered redesigning applications for multi-threaded programming, buying new hardware (which, he acknowledges, he's done anyway) and using existing hardware that might be idle during the day. "We went through the process of evaluating all those options, including assessing grid vendors, virtualization projects and software recoding projects," he recounts.

Reprogramming, Algreen notes, was deemed to be overly time-consuming and costly. "Finding good, high-quality programmers to take advantage of your horsepower is expensive," he says. While the firm does have two developers who have the expertise to optimize code for efficient multi-threading, "Both those guys are too valuable to be doing that sort of programming," Algreen notes. "I'd rather have them working on quantitative libraries and programming models."

Unfortunately, Algreen concedes, none of the solutions AVM considered offered all of the desired technology elements alongside cost-efficiency. "Many of the products we felt could handle this were at least a $150,000 to $300,000 initial investment," he relates. "For what we were looking to do, that didn't make sense."

After a year of entertaining grid solutions and reprogramming concepts, followed by a primary evaluation of products from August through December 2006, the firm chose grid software from Oakland, Calif.-based Digipede. The initial integration took one month. Phase one -- grid-enabling some proprietary valuation and risk model libraries that about five years earlier had been moved onto a .NET platform (for ease of management and Excel integration) -- was completed in three months, wrapping up in mid-April. While other projects involving the Digipede grid required more integration and effort (for example, some C, C++ and C# code had to be modified to work with the grid software), according to Algreen, the gridification of the valuation and risk processes was a simple matter of adding a handful of lines of code, and the new grid was soon running jobs and tasks across several desktops, servers and virtual servers. AVM took Digipede live in January 2007.

"As a fixed-income fund, throughout the day and through the night we're doing various risk and portfolio valuation processes," Algreen relates. "Those all are somewhat time-consuming, depending on the complexity of the instruments in the portfolio. When you're trying to do a substantial number of risk runs, you run out of time trying to do it serially on one machine. By using Digipede, we were able to shorten the time frame from hours down to a few minutes." He adds that staff have become almost obsessed with throwing applications on the grid because it's so easy to do.