01 August 2008

Backups Reorganization pt. 6: sysadmin slip-up

Earlier in this project, I described how I condensed the many Logical Volumes into one big one for each server performing remote backups. As part of that process I had to not only carefully rework the LVs but I also had to alter the /etc/fstab file for each partition that was removed. I knew that if I left a partition listed in the /etc/fstab file that didn't actually exist then the machine probably would not come back up after a reboot.

Well, that's exactly what happened with one of the four machines. I decided to reboot each machine since each had over 510 days of uptime. One did not come back on line. I had to travel on-site to the secured location to gain access. (Gaining physical access was tricky, as it should be. You don't want to let just anyone walk into your data center!)

Once I got a monitor and keyboard on the target server I saw that it had indeed tried to run fsck against a partition which was not mounted because it no longer existed. It was waiting to for user input to go into maintenance mode. The fix was easy -- simply remove the offending line from /etc/fstab.

No comments:

Post a Comment