AMD, for example, plans to offer its first six-core server processor, code-named "Istanbul," by the second half of 2009, and Intel, which already introduced a six-core processor, plans to launch the eight-core Nehalem EX server chip later this year. And faster servers, boasting 24 and even 100 cores, reportedly are on the chip manufacturers' drawing boards.
The catch is that as firms roll out the new servers, including the current quad-core servers, they won't see performance improvements for their legacy single-threaded programs -- unless they do some finessing. Fortunately the chip manufacturers -- as well as grid, virtualization and multi-thread software vendors -- all have jumped in to offer solutions to this problem, and Wall Street firms, including brokerage AVM, already have begun exploring these options.
Older, single-threaded applications are designed to do all their work on one processor. When a single-threaded application is run on a quad-core server, the program still runs on just one processor, which will work very hard while the other three cores sit idle. And if two applications are run on separate processors on the same chip, sharing that chip's memory and other resources, bottlenecks can occur.
Yet multicore servers promise increased computing power and inevitably will make their way into every Wall Street data center within the next three years, observers say. With the right program design, according to experts, applications could potentially run eight times faster on an eight-core chip than on a single-core chip. This type of performance begins to seriously compete with the specialized hardware accelerators, such as graphics processing units and field-programmable gate arrays, with which Wall Street has been experimenting in its quest for high-speed, low-latency trading.
The challenge of writing code to run on multiple processors is not brand-new, as multi-CPU servers have existed for several years and the design challenge of spreading work across multiple CPUs is similar to that of spreading it across multiple processors. But the challenge persists because Wall Street is littered with aged yet still-useful programs and there's a dearth of multi-threading programming talent -- in other words, developers who can create elegantly designed programs that run concurrently across many cores are scarce.
In addition, program rewrites typically present a few challenges. One is that some older languages don't lend themselves to multi-threading. "C and C++ are comparatively ancient languages and concurrency isn't part of the language," explains Mike Dunne, CEO of Activ Financial, a Chicago-based market data solutions provider. "People who are developing in Java or C# have an advantage in that concurrency is natively part of the language." Another challenge is that, if a developer isn't careful, various threads can interfere with one another, for instance by competing for the same memory resources.
However, Dunne says, for all its challenges, the principles of multi-threading are straightforward. "One obvious way to think about it is 'divide and conquer' -- "Let's divide the problem into pieces and have a thread work on each piece,' " he says. "How you chop up a program is part of the art of computer programming."
Some Wall Street applications are "embarrassingly parallel" -- in other words, they are easily ported over to a multi-threaded environment. Monte Carlo simulations and options pricing engines are examples of programs that run calculations hundreds or thousands of times, and that work can be easily divided among processors.
For those who lack in-house development talent or who don't want their developers to be spending their time parallelizing, several software vendors, including RapidMind and Simtone, offer products that parallelize existing single-threaded apps to run on multicore processors.
Broker-Dealer 'Wraps and Adapts'
One alternative to the time-consuming task of rewriting application code for multi-threading is to use an intermediate layer, such as grid or virtualization software, between legacy applications and new chips. This is sometimes referred to as "wrap and adapt."
At Boca Raton, Fla.-based AVM, "We had a double problem surrounding the issue," relates Paul Algreen, the broker-dealer's chief technology officer. "The first was, we had old, legacy C and C++ applications as well as new .NET apps that had too many processes that were taking too long. We also wanted to throw more computational jobs into the mix and couldn't do it because there weren't enough hours in the day."
According to Algreen, he considered redesigning applications for multi-threaded programming, buying new hardware (which, he acknowledges, he's done anyway) and using existing hardware that might be idle during the day. "We went through the process of evaluating all those options, including assessing grid vendors, virtualization projects and software recoding projects," he recounts.
Reprogramming, Algreen notes, was deemed to be overly time-consuming and costly. "Finding good, high-quality programmers to take advantage of your horsepower is expensive," he says. While the firm does have two developers who have the expertise to optimize code for efficient multi-threading, "Both those guys are too valuable to be doing that sort of programming," Algreen notes. "I'd rather have them working on quantitative libraries and programming models."
Unfortunately, Algreen concedes, none of the solutions AVM considered offered all of the desired technology elements alongside cost-efficiency. "Many of the products we felt could handle this were at least a $150,000 to $300,000 initial investment," he relates. "For what we were looking to do, that didn't make sense."
After a year of entertaining grid solutions and reprogramming concepts, followed by a primary evaluation of products from August through December 2006, the firm chose grid software from Oakland, Calif.-based Digipede. The initial integration took one month. Phase one -- grid-enabling some proprietary valuation and risk model libraries that about five years earlier had been moved onto a .NET platform (for ease of management and Excel integration) -- was completed in three months, wrapping up in mid-April. While other projects involving the Digipede grid required more integration and effort (for example, some C, C++ and C# code had to be modified to work with the grid software), according to Algreen, the gridification of the valuation and risk processes was a simple matter of adding a handful of lines of code, and the new grid was soon running jobs and tasks across several desktops, servers and virtual servers. AVM took Digipede live in January 2007.
"As a fixed-income fund, throughout the day and through the night we're doing various risk and portfolio valuation processes," Algreen relates. "Those all are somewhat time-consuming, depending on the complexity of the instruments in the portfolio. When you're trying to do a substantial number of risk runs, you run out of time trying to do it serially on one machine. By using Digipede, we were able to shorten the time frame from hours down to a few minutes." He adds that staff have become almost obsessed with throwing applications on the grid because it's so easy to do.
Excel-ing on the Grid
That ease of use also led AVM to gridify its Excel spreadsheets using Digipede. "If I have an analyst who wants to price up a bespoke CDO or portfolio of bespokes, he defines those in Excel and would normally run them locally on his desktop," Algreen explains. "Now he'll run them on the grid. By grid-enabling Excel, we've enabled the analysts to do real-time analysis."
Behind the scenes, developers have to identify and declare which types of objects can be operated on in parallel. Then Digipede acts as the workload manager, sending out calculations to myriad compute nodes in parallel, where the math is done many times faster than if it were running on a single machine, according to Algreen. The software then collects the information from all the machines and sends it back into Excel.
"Say we have a portfolio with 3,000 tranches and I want to run a market scenario across 30 nodes on my grid," Algreen says. One job can be created for each node, each handling 100 calculations, dividing the processing time by 30, he explains, noting that Digipede not only manages the distribution of workloads, it also can report on usage by application and by submitting department so that IT can calculate precise charge backs.
This year, according to Algreen, AVM plans to continue to add more asset classes to its grid. It's also looking to use the grid as a way to make real-time data analytics more unified across the firm. "Right now, if there are 30 analysts or traders looking at a given portfolio, if they all hit Shift-F9 at the same time and recalculate their portfolios, they might all get different answers," Algreen relates. Putting all real-time data analytics on the grid would enable faster, more consistent answers, he contends.
Like grid computing software, virtualization software also can act as a bridge between old applications and new servers. Instead of being forced to act in parallel, existing applications can be run on virtual machines or hypervisors that distribute work across multiple cores. J.P. Morgan Chase is among the financial services firms reportedly using this option.
"I talk to all the major financial services companies in New York City, and what I'm seeing is that a lot of these folks don't have the resources to rewrite these single-threaded apps to work in a multicore environment," notes Mike Rosenstein, senior manager, enterprise development, eastern region, AMD. "I'm seeing more companies take applications that historically they loaded onto a single server that barely got above 20 percent CPU utilization and loading them into a quad-core server with a hypervisor such as VMware on it, taking advantage of the increased consolidation and efficiencies associated with that approach. That seems to be a much quicker and easier solution than trying to recode or rewrite the application."
Rosenstein says that financial firms tapping virtualization typically allocate one core per application, so that 16 applications can run on one four-socket quad-core server (whereas in the past they might have run on 16 separate servers).
There is a drawback of relying on virtualization to span the old software/new server chasm, however, especially for any application that requires high performance: Hypervisors can add a level of overhead, causing further latency, Rosenstein concedes. "There are some applications that customers still prefer to run in a native environment," he relates. "But virtualization performance continues to improve, and more and more things that were originally done by the hypervisor are now being done in silicon and therefore faster."
According to Rosenstein, AMD offers virtualization optimization features on its chips, such as Rapid Virtualization Indexing, a method of mapping memory in silicon rather than in the hypervisor (read more about Rapid Virtualization on AMD's virtualization Web page). In mid-2009 the company plans to launch a new Fiorano platform that will include a virtualization performance feature called I/O Memory Management Unit that lets users tie a virtual machine to a specific input/output (I/O) device, a function formerly performed in a hypervisor.
Intel also has a few tricks up its sleeve, including products and programs designed to help developers deal with multicore chips. "As we bring out new technology, we have a challenge to make sure there's a path for it to be absorbed in industry," asserts James Reinders, chief evangelist for Intel's software development tools.
For example, Intel created an open source C++ template library, called Threaded Building Blocks (TBB), that simplifies the development of software applications that run in parallel. "It's been a big hit around the world," says Reinders, who is the author of a book on TBB. "C++, which is widely used, wasn't designed for parallelism; we try to provide all the things you need to make C++ a worthy way to write parallel programs."
And an Intel product called Parallel Studio, which is still in beta, offers compilers, libraries and analysis tools for parallelism in the C and C++ languages. Reinders says several thousand people have downloaded it already.
"There are types of parallel programming bugs that can't happen in a serial program -- they aren't necessarily more difficult or weird than programming problems people are used to, but they're different," Reinders relates, adding that Parallel Studio is intended to help people figure those out. "None of this is automatic; you need to learn to think parallel, and you need to wrap your head around how to do more than one thing at once." It's not unlike doing a project with 10 friends rather than doing it by yourself, he adds: You have to think differently about how to organize the work among the people.
But as multicore CPUs begin to take over Wall Street's desktops and data centers, this new way of thinking is becoming a prerequisite. In fact, new multi-threading-friendly languages, such as Erlang, have emerged to streamline future application development.