We have a wonderful old solaris legacy system that runs our payroll, hr, accounting systems. Our primary, unix savvy DBA leaves on vacation and leaves some minimal instructions for the junior, non-unix savvy DBA on how to clean out the print queue when a user submits a very large job by accident (which happens way to often). Yesterday jDBA comes running over and says I cannot ssh into the unix system — sure enough, I try to ssh in and immediately am kicked right off. Soon the helpdesk is swamped with reports of users who cannot access the system any more. I hop onto my trusty console switch and log in, only to get errors where it cannot find the /bin directory. I then start asking jDBA just what the heck he did (can anyone guess…).To make a long story short, jDBA did the following:
> sudo -i
# cd / var/spool/lp/requests/sun1/
# rm *
Notice the space before var/spool/lp/requests/sun1/. He deleted all the files in the / directory. On solaris 9 this means that the links /lib, /bin, and several other links for the application were deleted. Pretty simple to fix, but it took a while to figure out what had happened.
Lessons Learned are:
1. For repetitious tasks, write a script to do something to avoid fat fingering a command.
2. I taught jDBA him some unix basics (bash, tab key)
3. Our secondary server is *not* identical to the primary having been used for several development branches so it was no use comparing the primary to the secondary (typically a quick way to figure out what is wrong).
4. Good backups are a lifesaver. Good backups directly to remote disks so you do not have to fumble with tapes are heaven.
5. Leave detailed instructions and have the junior person practice them before you go – think about what they can do wrong