At a network conference today, a group of computer scientists from the University of California, San Diego and Purdue University said they have developed an inexpensive way of diagnosing data center networking delays as short as tens of millionths of seconds — the kind of delays that can lead to multi-million dollar losses in algorithmic trading.

According to an article at physorg.com, the new approach, called Lossy Difference Aggregator, can diagnose microsecond delays and packet loss as infrequent as one in a million at every router within a data center network. (One microsecond is one millionth of a second.) "The solution could be implemented in today's router designs with almost zero cost in terms of router hardware and with no performance penalty," the article states.

Wall Street is a major target of the new technology. "This is stuff the big traders will be interested in," said George Varghese, a computer science professor at the UC San Diego Jacobs School of Engineering, in the article. "But more importantly, the router vendors for whom such trading markets are an important vertical."

The approach to eliminating network delays is designed to operate within data centers, rather than between exchanges and Wall Street firms, where external monitoring systems typically perform this work. Existing techniques for measuring latency within data center networks are inadequate, say the academics in a paper that you can download here.

"Current routers typically support two distinct accounting mechanisms: SNMP and NetFlow," the paper states. "Neither are up to the task. SNMP provides only cumulative counters which, while useful to estimate load, cannot provide latency estimates. NetFlow, on the other hand, samples and timestamps a subset of all received packets; calculating latency requires coordinating samples at multiple routers (e.g., trajectory sampling). Even if such coordination is possible, consistent samples and their timestamps have to be communicated to a measurement processor that subtracts the sent timestamp from the receive timestamp of each successfully delivered packet in order to estimate the average, a procedure with fundamentally high space complexity. Moreover, computing accurate time averages requires a high sampling rate, and detecting short-term deviations from the mean requires even more. Unfortunately, high NetFlow sampling rates significantly impact routers' forwarding performance and are frequently incompatible with operational throughput demands."

External network latency monitoring systems tend to be either highly inaccurate or overly expensive, according to the paper.

The professors' Lossy Difference Aggregator calculates latency by randomly splitting data packets moving from one network node to the next into groups and then adding up arrival and departure times of each of the groups separately. "As long as the number of losses is smaller than the number of groups, at least one group will give a good estimate," the physorg.com article states. "Subtracting these two sums (from the groups that have no loss) and dividing by the number of messages provides an estimate of the average delay, using only a series of lightweight counters."

The academics hope router vendors will implement their technique and thus enable routers to monitor themselves and customers to identify problem routers.