Scaling Up

Frustration, not Necessity, is the mother of Invention.

So I’m wont to think of ways I can contribute to that-what-was-once-SAGE and I’m reminded of tal & Christine’s book and the triage/stable/icing approach to operations. For example, triage is “put all your local scripts & software into version control”, stable is “build a software depot”, and icing is “automate the bejeezus out of your system build process.” Something like that.

That presumes that your ops organization evolves faster than the systems that they manage, and that’s a fine and reasonable approach. The problem I’m concerned with is coming into a small organization with an ad-hoc design and trying to restructure it so it’s in a position to scale. My frustration comes both from my experience growing my home network as well trying to apply some structure and best practices to my employer – in both cases, I’m the sole sysadmin.

Here’s an example: I have 2 machines lying around the house and I decide I’m going to install linux and run web and mail services. Great. I sort out firewalling and DNS and get the servers running. The problems that crop up not long after installation are:

* I have no real backup system.

* My local data is intermingled with my system data, making upgrades and backups difficult to manage.

* I have three different versions of the same operating system since I added hardware gradually, I need to upgrade, but the services each machine provides are intertwined (DNS 1 & 2, mail, web, etc.)

* I want to offer mail services to people without giving them shell accounts.

* I have different logons and uids on the two machines which complicates services like NFS.

* A user may have several different passwords for web applications, mail, shell access, etc.

* Web and mail applications send passwords in the clear.

* DNSSEC and NTP security is not enabled.

* I have no way of knowing what’s eating all my disk or bandwidth, and I have little indication that a drive or memory module is about to fail or that my webserver or cron has unexpectedly quit.

* (for Windows admins) I really don’t want the hassle of running Active Directory but I don’t want to manually manage password files, etc. either.

* Oh yeah, I want my Windows, Mac, and unix boxes to share files without too much headache.

More experienced people know to condense authentication and authorization with some combination of PAM, LDAP, SASL, Kerberos, NIS/NIS+, etc. The same goes for backup & recovery systems like Bacula, Amanda, or Mindi/Mondo, separating inbound & outbound mail systems, running an authenticated mail submission service, implementing some form of IMAP, managing virtual mail domains via Cyrus, separating the hardware from the OS with PXE-boot or Xen, &c., &c.

So given limited experience and budget, the question is how does one restructure a small ad-hoc system into something that can scale, and where does one start? Or better, how do you lay the foundation for a scalable system starting with your first two servers?

For lack of a better name, I’ll call this project “Scaling up” – an attempt to draft and edit best-practice tutorials that address the core issues of scalability and provide clear, consistent guidance on using available tools to resolve scalability problems.

There are plenty of software packages out there and somewhat fewer clear configuration tutorials. Possibly fewer are higher-level discussions of scalability – how does someone who does not know or care about (for example) NIS, LDAP, or Kerberos know that those are potential solutions to their problem or even find out they exist, choose the best to implement, and actually successfully configure it to resolve the problem?

LOPSA has plenty of technical talent; certainly we should be able to identify key scalability problems and write up best practices using a variety of existing tools. This would provide a complement to higher-end advice. I’m considering the lower end of the scalability spectrum (2 -> 20 -> 200) rather than (200 -> 2000 -> 20000) which might be more appropriate for a USENIX focus.