Classic case of a change hurting you later |

So this morning I got into work to find a that a database machine had crashed. This particular machine is a legacy, old machine (debian 4) that acts as a gateway between multiple systems so it has mounts to lots of other systems. It came back up cleanly, but two mounts did not remount properly (one nfs and one cifs). After a bit of investigation I found the following:

1. The nfs mount did not come back because it is on a multi-protocol file server and it is a mount buried deep in a path. The folks who own the directory decided they did not like BusRouting and changed it to ‘Bus Routing’. So not only did the path disappear, but NFS really doesn’t like spaces in path names. Simple fix was to remove the space and then educate the folks, but a larger issue is with multi-protocol servers people can easily do an innocuous name change which can break other systems.

2. The cifs mount problem was a classic case as the server was rebuilt a month ago and the admin that rebuilt it did not know what the user account for doing the mount was so he did not migrate the user account to the new server. In addition, we moved from windows 2003 and windows 2008 which has additional security so on older linux machines you need to use the IP address instead of the machine name to connect using mount.cifs. Fix was to recreate the mount user account on the windows machine and change the fstab file on linux to use the IP.

These problems lasted for a month and were not caught sooner because (a) we do not have good automated monitoring systems, (b) the mounts were used by processes that do not run very often, and (c) the DBA that normally monitors these items has been on a large project for the last 2 months and not even reading his email (with no backfill).

The root cause of these issues are lack of appropriate resources for the amount of work we have. Because of this we are not able to properly document systems and monitor them.

Aleksey_Tsalolikhin	spp
Kacoroski, Ski	Damon, Lee
Boris, John	tep
borwick	caseybea
jessetrucks	LOPSA Blogs Admin
McCullough, Mark	hcoyote
stpierre	trey
Matt_Simmons	fatherlinux
nhruby	lois
Constantine, Craig	wnl
jdetke	drich
mhalligan	doug
jlothian	jennine
Philip Kizer	jeremyc
lufthans	English, Paul
moose	warner
Evan_Pettrey	villyard
asachs228	nicolefv
wbilancio	nickanderson
unicityd	mharlow
allberyb	d_white
apthorpe	mdisney
mjulian	solarce
nnmiller	TheDreamer
morgajel	jgsmith