How to diagnose causes of site slowness? - MediaWiki-l

20 Apr 2013

rationalwiki.org is currently serving pages very slowly. It's
intermittent, but when it's slow it's a *slug*. Many users are getting
502 errors from Apache or 503 from the Squids.

We have one Linode doing Apache/MySQL/Lucene. It's an 8GB box with 8
cores. (Was 4GB/4 cores, but Linode just doubled everyone's server.)
In front of that are two Squids fed by a load balancer.

* Sometimes the cause is obvious: when the load average is 30 and top
shows a pile of Apaches using up CPU, then it's PHP handling a complex
page request. (No, I still haven't made it PHP via fcgid.)
* Sometimes it isn't, e.g. this afternoon when the site was running
like a slug and load average was 0.8 with nothing amiss in top.
* The squids don't show an unusual rate of hits on the site.
* We have plenty of memory free - about 4GB on the main box is just
sitting in file cache.
* php_errors.log only shows up some processes timing out their 30
seconds (which would be the 502s).

So where would I start looking to work out what's going on?

- d.