Just a few notes:
Pliny crashes roughly once a month. This must stop.
Not only are the outage periods annoying, but every crash risks
corrupting the database. Hardware problems have been vaguely suspected,
but we've never really had the chance to swap pieces out until it stops
crashing.
At this point I would tend to suggest that we:
* replace pliny as the database server with a slightly beefier machine
(eg, the Altus 130 or 140, with the max of 4 gig ram)
* take the old hard drive _out_ of pliny and set it aside, as it's been
suspected of being a problem
* reassign pliny as the new web server front end
* reassign larousse as the backup server; it could hold both a
replicated database and take over web server duties in a crash emergency
by taking over the other machine's IP address
The present larousse is overloaded (typical load average around 8-10
during daytime US, or up to 30 if some ass decides to spider the site
for every printable page), and a lot of pliny's CPU is taken by its
additional web duties (pushing load to around 4).
More RAM for the database should help it.
More CPU for the web server would definitely help it.
More RAM for the web server should help it once we start making more use
of in-memory caching, which should decrease load on the database (and
decrease the amount of stuff that it has to lock, leading to faster
response times on reads while other operations are running).
-- brion vibber (brion @
pobox.com)