March 02, 2012

In a column a few days ago, I questioned the value of infrastructure-as-a-service (IaaS) offerings based on their lack of adherence to Moore's Law. My thesis: While CPU performance and drive storage capacity continue to climb at logarithmic rates, IaaS vendors aren't providing those implied cost savings back to their customers. I received two sorts of responses to that column: those thankful for the oversimplified example I provided; and others wanting more concrete numbers applied to real systems.

I took some time to do a back-of-the-napkin calculation for storage, and I'll share my results here. Before jumping into the numbers, however, it's important to know that it's pretty much impossible to do an apples-to-apples comparison between 2006 IaaS prices (the year Amazon first offered EC2 and S3) and 2012 prices. Sure, for storage systems you can compare drive capacity, but that's not the full story. An iSCSI drive array in 2006 would typically come with 2-Gbps to 4-Gbps Ethernet adapters, while today you'll get a few 10-Gbps Ethernet adapters. You'll also get six years of advances in firmware and software. So let me say right from the start: This not only isn't an apples-to-apples comparison, but you probably don't want one.

What we want to understand are the relative improvements in cost, performance, and reliability that you got from IaaS vendors over six years compared to the improvements you'd get from buying systems the old fashioned way and running them yourself. For no other reason than convenience, I chose to compare storage prices. I was able to find some good historical data that I think makes for a compelling comparison. I decided to compare Amazon's 3S prices from 2006 until now with the prices of a hard drive and an actual storage array over the same period.

I threw in the storage array because while the price of a hard drive is obviously going to change radically over six years, the price of other storage system components won't change that much. Power supplies and other hardware don't adhere to Moore's Law, and certainly there are significant costs in developing firmware and software for drive arrays that also don't drop logarithmically. So it would be fair to expect that drive prices would change the most, followed by array prices, followed by the price of the Amazon offering, which must take into account other overhead required to run the storage system. The relative magnitudes of the differences are what's important and telling, and that's what we want to understand.

Since quantity makes a difference, we'll assume that we're looking at storing 50 terabytes of data, and that we'll look at the total cost over four years. This is back-of-the-napkin; we know there are lots of costs I'm not including in that four-year number, including amortization, failed drives, additional hardware requirements, maintenance contracts, and the time value of money. A more detailed analysis is critical for a buying decision, but I think we can illustrate some fundamentals without hauling out a spreadsheet (if anyone wants to do that, please do, and I'll post it and give you credit for the work).

Once you find the historical data, both the Amazon and raw disk calculation are pretty easy to do. For Amazon, the S3 2006 price was $0.15 per gigabyte per month, so the total cost for 50 TB for four years--assuming a contract with no clause for reduced price over its term--is $360,000. This year, the Amazon price per gigabyte is down to $0.108 per gigabyte per month, so a similar four-year contract for 50 TB would now be $259,200. So it cost 39% more in 2006 to store the 50 TB than it does now. Note that we haven't calculated any fees for using data or retrieving it--just for storing it. We'll get to other fees later. Nonetheless, Amazon is lowering prices, which seems like a good thing.

To read the entire original IaaS data-filled article, visit InformationWeek.