Virtualization technologies and reducing admin workload

At work, every new technology we introduce must go through engineering and financial justification and be approved by a standards governance body. This is to help corral the proliferation of technology choices that we already have throughout the world.

My most recent major technology introduction is Solaris Containers for virtualization. Awhile back the Windows team introduced VMware for Windows virtualization (and our Linux team has never been happy since that Windows controls VMWare ESX, which runs on a Linux base). It is starting to gain traction and now there is some demand for Unix virtualization.

While working through the financial justifications, I have had my team looking at where we can find cost avoidances and what other kinds of efficiencies we can drive. Capital costs are pretty easy. Assuming you can take some number of servers and consolidate them onto a lesser number of servers, driving up utilization without having to move to larger systems, you can simply divide the total number of operating system “instances” (I) you want to end up with by the compression ratio (C), subtract that number from the instances, and multiple by your per server cost (E) to get the savings. S = I – (I/C) * E

For example, if we are talking about 100 instances, at a compression ratio of 4:1, and a server cost of $10,000:
100 – (100/4) * $10,000 = $750,000

Similarly for maintenance costs, you just substitute your maintenance charge (M) for server cost (E)

What becomes a little more interesting, especially in this day, is talking about space savings (power and cooling are also a bit more interesting, but follow pretty close to the same formula minus a slight increase in load based on fuller system utilization). In reality, how do you show space savings on a per system basis? A system goes in a rack. Whether the rack is full or empty, it still takes up the same amount of space. However, when we talk managerial accounting as opposed to financial accounting, we can make some simplifications. Our data center operations team calculates that a rack takes up 40 ft^2 (including clearances, allocation for PDU, etc). So, if we can fit 8 servers in a rack, each server takes up 5 ft^2. If we calculate an occupancy cost on a ft^2 basis, at that point, the formula works for us.

Then, my manager asked whether or not we could see any administrator efficiency by using virtualization. For every other OS that we are looking at (Windows, Linux, AIX), their teams have responded no. Each virtualized instance is still a full OS instances, so it still takes the same amount of admin workload to manage the number of instances (oh, another assumption there… hardware takes essentially no “administration” time; its handled by operators and vendor technicians). In fact, for them, I would say it actually takes more admin workload because of the complexity of managing multiple workload on one system and worrying about resource management.

However, my response was that for Containers, there should be an efficiency gain. The question is how to quantify it. Most administrative tasks are run from the global zone (such as backups, monitoring, tuning, patching, application installation, most troubleshooting, etc), but are effective for all zones. Some tasks, such as user administration and application configuration, have to be run on a per container basis, so there is still some workload involved. Additionally, since admin tasks at the global level affect all zones, there is an added complexity to arranging outage windows in order to perform those tasks and managing workloads and resources.

So, if you haven’t fallen asleep yet, the question becomes what is that efficiency number? Do we go with a straight number that says virtualization saves 20% (or 40% or whatever) of your admin time, so admining 100 containers is really like admining 80 (or 60 or whatever) servers? Does the compression ratio matter? If compression matters, then there should be a 100% workload for the global zone and an 80% (or 60% or whatever) workload for the container. If we choose 40% savings and have 100 instances at 2:1 compression, then (50*100%)+(50*60%)=80; if it is 4:1, (25*100%)+(75*60%)=70. That’s fun, but what does it mean? Well, if we look at our current server to admin ratio of 50:1, then we go from needing 2FTE to needing 1.6 or 1.4, or we could say our server to admin ratio increases to 62.5:1 or 71:1.

My hypothesis, based on 15+ years of SA experience, is that the number is about 40% savings. However, I’d like to test that against real world results. Is anyone out there running containers on a fairly large scale, and can you comment on whether you’ve seen an administrative efficiency gain from using them, or whether the added complexity had actually increased your workload?