[Wikipedia-l] Server back up after partial crash

Brion Vibber brion at pobox.com
Thu Oct 2 21:50:01 UTC 2003


Sorry about the downtime today; everything _should_ now be back up and
running.

On first look this seems to be similar to the crashes we've gotten on
the server all too frequently since we acquired it: one of the disks
froze up and wouldn't respond anymore until the machine was rebooted --
which was complicated by the machine not being willing to reboot until
it had cleanly unmounted the disks, which it couldn't do because it had
to wait until the disk responded...

Anyway, the parts for upgrades should be arriving over the next few days
and the machine can be rebuilt early next week. If the problem is
related to the motherboard, that should resolve it once and for all.

If it's a problem inherent to the disks, well, that won't help, but
it'll reboot faster! ;)  And we'll have a better idea where the problem
lays.

Since it was the secondary disk that halted this time, not the primary,
we did get some info in the logs. If anyone out there is familiar with
decoding SCSI failures on Linux, I posted a log extract on wikitech-l:
http://mail.wikipedia.org/pipermail/wikitech-l/2003-October/006314.html

-- brion vibber (brion @ pobox.com)



More information about the Wikipedia-l mailing list