At an Intel event last night, senior research specialist Mikhail Smelyanskiy noted that although standard server hardware is becoming increasingly multi-core, a lot of existing code on Wall Street was written for single cores, and IT people on the Street tend to lack parallel programming skills. (So although servers are getting faster, Wall Street applications aren't equipped to take advantage of the better performance.) Specialized accelerators, such as the IBM Cell, the Nvidia Tesla and ClearSpeed's FPGA board are "hard to program and require algorithm changes and explicit memory management," he said. (Of course, it is in Intel's best interest to dismiss the competing hardware accelerators and push the idea of redesigning applications to take advantage of Intel's multi-core architecture.)Financial applications can be modified to scale well across multiple cores, Smelyanskiy says. "Financial applications have parallelism at multiple levels," he says. "At the highest level you have multiple financial instruments that you want to price. Each can be executed independently from the others. At the next level, there is parallelism within pricing an individual instrument. For example, Monte Carlo [models] can be parallelized by doing multiple paths simultaneously since they are independent. There are also ways to parallelize the PDE (partial-differential equations) solver for multiple dimensions, used to price multi-asset securities. The combination of both levels of parallelism enables good adaptation to multi-core."
Monte Carlo and Black-Scholes calculations are particularly well suited to parallism because they have many independent chunks of work. For example, you can price thousands of options using the Black-Scholes formula. "The amount of parallel work far exceeds the number of parallel cores available today," Smelyanskiy says.
What's needed to make the transition to parallel applications easy? Smelyanskiy offered an Intel-friendly list of ingredients: a familiar architecture and instruction set, smart compilers and math libraries that provide good performance under the hood, and performance analysis tools that identify bottlenecks and provide thread profiling.
On a separate note, Intel told the Wall Street Journal yesterday it's developing a faster way to pass data among servers. "The company said it combined silicon -- the low-cost foundation for most computer chips -- with the element germanium to make a device called an avalanche photo detector that achieved record performance," writes WSJ reporter Don Clark. Down the line, an improvement like this could be used to reduce trade data latency.