July 16, 2018
Cloud TCO: The Soft Costs are the Hardest
When the public cloud was first introduced, it was met with skepticism and mistrust. After all, why would we let someone else keep our data and manage our equipment? I don’t want my patients’ data sitting on a server next to an illegal torrent from Ukraine! Since then, the cloud has really proved its mettle with regard to security and compliance.
The real surge came when organizations could prove a rather profound total cost of ownership for cloud adoption. The cloud operated at a scale that allowed for (really, required) tools of modernization, automation, and software-defined everything. After a while, it really became a matter of, “Why aren’t you going cloud?”
Most recently, these tools of modernization and automation have become available in intuitive on-premises solutions. This inexpensive availability, when complemented by the lack of public cloud “pay as you go” economics, has some in the industry rethinking the cloud vs. on-premises TCO conversation. Even Michael Dell made comments to this effect recently (see excerpts from his interview here). If you look at the cost, it seems on-premises is commonly more cost-effective.
It’s been interesting to watch this dialectic unfold, but through it all, I can’t shake the feeling that there are some aspects of this conversation overlooked. Namely, people and downtime.
Possibly, the most compelling aspect of the Cloud ROI is the potential for customer staff to focus on higher-value projects, rather than day-to-day IT operations. Now, it’s true that as the automation, monitoring, and software-defined data center becomes more common and cost-effective to be found in even the small to medium customer base, this delta between on-premises and cloud will diminish.
I doubt very much the delta will ever approach equilibrium, even more so that it would ever flip. An on-premises customer will always be implementing and managing a single tenant; a cloud provider will always be multi-tenant. Even if we were to assume parity of resources (which is unlikely) and both sets of infrastructure are equally easy to manage, the Cloud Provider can recuperate those costs across multiple tenants. Let’s say this infrastructure is automated to the point where you need 1 admin for every 500 servers. If you only have 250 in your environment, you still need a capable FTE, while the cloud provider can charge you for only 50% of one. This is a simplified example but this efficiency of multi-tenancy at hyper-scale is impossible to simulate on-premises.
But Jacob, I just have that FTE do other work for half of her time. Well, okay – you can do that, but finding a qualified compute FTE can be hard enough, not to mention one that can split her time without any loss of proficiency in either side. The cloud provider has a dedicated X engineer and a dedicated Y engineer, while you have one engineer attempting to do both X and Y, probably at a lower level than the dedicated resource. Sure, you’ve lowered your cost…but at what cost?
Cloud solutions experience fewer service interruptions, due to scale and redundant architectures designed to meet service level agreements. As the technology gets better and cheaper, this difference too will decrease, but dollar for dollar, this will not ever average out to equal. How many data centers does AWS or Azure or Google have? How many do you have? This geo-redundancy, on top of everything else, is going to predicate a more sustained uptime. Again, at such a hyper-scale, any given failure has a lesser impact on the environment than it would on a smaller environment.
Downtime costs money. The number you come across often enough is $7,900/minute of downtime. Unfortunately, this number is so large it beggars belief. This figure is actually old – it comes from a 2013 study by the Ponemon Institute which I’ll link below. The number as of 2016 rose to about $9,000/minute. This report samples a gamut of datacenter sizes and industry verticals.
I do not want to get bogged down in the details of exact figures; though you can find them here if you’re interested – it’s a good read. While it is a significant and relevant aspect of the Cloud vs Local TCO decision, I have never had a customer answer Yes when I ask “Have you looked at the cost of downtime?” The 2-step follow-up I get is “How do I calculate that?” The method I use is to take the cloud SLA uptime commitment compared to your historical annual uptime and apply a conservative rate of downtime to that delta.
Example” Cloud SLA = 99.9%. Customer Uptime Last Year: 99.5% Uptime. This is a difference of 1,564 minutes, for just 0.4 of 1%. I generally work with small to medium hospitals, so my customers will fall greatly below the average of 9,000 / minute. Even at 10%, though, 1,564 minutes amounts to an annual cost of $1,407,600 by not moving to the cloud.
1 and a half million dollars a year.
Maybe even that number is high, but the goal is to think of real numbers. That downtime is not just an angry nurse. That downtime is a rescheduled operation, that downtime is fewer walk-ins, that downtime is money.
The decision to go to the cloud is not entirely one about money – that was merely the focus of this post. There are many additional things I want to say about it.
I want to talk about OpeEx vs CapEx. I want to talk about how OpSus is not a public cloud. I want to talk about future-proofing and security and performance and compliance and support and many other things. However, I am told that this was supposed to be more of a “Tell-Tale Heart” and less of a “War and Peace” affair.
I’ll leave those for next time, or send me an email – I would love to chat.
Product Manager, CloudWave