Evaluating New Config Management Systems

At $WORK, we currently use cfengine 2 to manage on the order of 300 systems. It works, it does most things we want. We use the [Singlecopy Nirvana|http://cfwiki.org/cfwiki/index.php/Singlecopy_Nirvana] “pattern” to distribute configuration files of all types, have copious shellcommands, and even a few editfiles. In general, it does what I need, but not necessarily the way I want. The general structure of our cfengine configurations is (for the most part) unchanged since I implemented it ~4 years ago. We’ve done the necessary things to support new OS’s, new architectures, what have you, but that’s about it.

Now, we have a great opportunity. We’re setting up a new center, completely independent of our current infrastructure. What this means is I have a chance to evaluate new tools. I’d like to evaluate a couple of the “new generation” of configuration management tools (don’t get me started on what a bad term config management is….). Ideally, after we’re done standing up the new center, we’ll be able to implement this new CM system within our current center.

Tools I’d like my team to look at:
* [Puppet|http://reductivelabs.com/products/puppet/]
* [Cfengine 3|http://www.cfengine.org/]
* [bcfg2|http://trac.mcs.anl.gov/projects/bcfg2]
* [Chef|http://wiki.opscode.com/display/chef/Home] (maybe)

We’ve got a few difficulties. My timeline is very compressed due to some purchasing snafus. If I go with cfengine 2, I can essentially just copy our current config structure over, not have to retrain my team, etc. An optimistic estimate is that will save me 40-80 person-hours. Another problem is other admins. It has been enough of a challenge to get them to accept the “thou shalt use the CM system on all of thy servers” commandment. Now we might throw something different at them, with syntaxes that might even less less straightforward than cfengine2? Outside of the infrastructure team, people tend to leave the cfengine scripts alone and just put files in the Singlecopy hierarchy for distribution. See the next paragraph for why this might be a problem.

So, why do I even want to look at new CM packages? Well, for one, the old “edit a configuration file and distribute it” mentality is getting long in the tooth. Why should every admin need to know the syntax of ”/etc/ldap.conf”? If each host needs a custom ”/etc/ldap.conf” because of how we do authorization, I need to have a different file for each host.

Why can’t I just describe LDAP client settings in my CM package and have it generate ”/etc/ldap.conf”? I want to be able to express my needs and have the CM package provide the solution. Sure, I can do this now with cfengine2 and some shellcommands, but that feels kludge-y and non-intuitive. I want to set an LDAP server globally in my CM configs, and add specializations for hosts that need them.

It looks like at least some of these newer tools will empower me to be expressive in this way. I don’t assume they’ll be able to do these things out of the box, but if there’s a reasonable extension mechanism, that’s fine.

What traits should I look at when evaluating new CM systems? My quick list:

* Expressiveness – can I tell it to do what I want, above and beyond distributing files
* Learning curve – can I get new admins up to speed quickly?
* Execution speed
* For example, there was a recent ;login: article that showed some data showing that Puppet is an order of magnitude slowed than Cfengine3. Does this hold for complicated configurations, or is the time difference on representative of the one-time overheard of starting a Ruby interpreter?
* Debugging – do the errors make sense, or are they cryptic like cfengine2’s perpetual messages about key problems?

My current thought is to give each member of my team one of the 4 to evaluate. They’ll do a test installation with a client and server, and perform the same list of tasks. After they’re done (and have documented their work), we take a second list of tasks and modifications to the first tasks, and implement them.

In the time we have, it will unfortunately be difficult to do an exhaustive evaluation of any of the systems. I hate to give any of them short shrift, but I also hate to miss this opportunity to start fresh.

So, what else would you look at given say maybe 16-24 hours to spend on any given software package? Which CM packages have the best community around them these days? Horror stories, success stories?

If you’ve done such an evaluation recently, I’d love to hear about your methodology and results.