January 02, 2014

Victor Yodaiken, FSMLabs Inc.
Victor Yodaiken, FSMLabs Inc.
Five years ago, the cutting edge of time synchronization technology was focused on accuracy - getting any errors below a microsecond where possible and squeezing out performance in less optimal systems. But as precise time synchronization evolved from a cutting edge competitive advantage to a basic infrastructure requirement, risk mitigation and cost of ownership turned out to be just as important and required more sophisticated technology.

Large scale automated trading systems now depend on accurate time: it has to work, and time distribution must be resilient and resistant to errors and spoofing, there have to be traceable records, and none of this must require too much expensive and scarce IT resources. Start with the simplest problem – what happens when an application server stops getting “time corrections” from its reference time source. The reference time source may be a device getting time from GPS satellites or a feed from a data center. There are numerous ways that the reference source can fail: a truck with an illegal GPS jammer parking in the wrong place, a lightning strike, a software or hardware failure in a GPS receiver, a network filter turning on, and many more.

Is there a fallback source in place? Is the fallback truly independent (Network IT admins have been known to have multiple supposedly independent reference time source that turn out to be all depending on the same initial source)? Does the failover method work reliably or will the system oscillate between the failed source and the backup? Is there robust notification of the failure or can it linger on silently for hours? Days? Months? Can IT staff or users easily check to see the status? Who is responsible for the failover, notification, and diagnosis systems? All these questions were rarely asked five years ago when precise time synchronization was something only the most technically advanced cutting edge traders cared about.

[For more on this topic see Check the Time: Majority of Firms have Time Synchronization Errors]

Risk management always involves a tradeoff on costs and without some care, those costs can spiral. Many companies lack a corporate wide time synchronization strategy that takes into account the level of engineering time and software/hardware investment needed to validate time distribution methods, provide management tools and integrate those with other network management tools, maintain traceable logs. As companies look for a balance of technology investment versus costs of failures, time synchronization needs to become a part of the analysis.

Victor Yodaiken, President and founder of FSMLabs Inc., brings to the company 20+ years of designing operating systems and real-time, fault-tolerant, and other system level software in both industry and universities. FSMLabs is bringing real-time to financial trading, simulation, and fault-tolerance, and other enterprise application areas.