Some of the things I look at when things are slow in no particular order:
* Check top. Look at memory usage (ensure no swap usage), CPU usage as well
as deadlocked processes.
* Check disk space. Ensure all drives have enough free space.
* Check error logs.
* See which server or component is being slow. Is it just one server? Is it
static or dynamic pages?
* Check the MySQL process list (SHOW PROCESSLIST). Generally this is no
more than a couple of items and any more can indicate an issue.
* Check hit rates in all caches. Is the cache getting filled up too quickly
resulting in a low hit rate and high refresh rate?
* If there's intermittent issues that are hard to track I would manually
benchmark services (ex: ApacheBench) to try spot the issue and the cause.
* See if there's any pattern to when the slow downs occur. See if they
coincide with any cron scripts (ex: Lucene index updates).
Note that most of the above items should be applied to each individual
server to try and narrow down the source. Since you appear to have 3
"content" servers (2 squids and one content) try directly accessing the
wiki from each one too see if that narrows down the issue.
On 20 April 2013 15:29, David Gerard <dgerard(a)gmail.com> wrote:
rationalwiki.org is currently serving pages very
slowly. It's
intermittent, but when it's slow it's a *slug*. Many users are getting
502 errors from Apache or 503 from the Squids.
We have one Linode doing Apache/MySQL/Lucene. It's an 8GB box with 8
cores. (Was 4GB/4 cores, but Linode just doubled everyone's server.)
In front of that are two Squids fed by a load balancer.
* Sometimes the cause is obvious: when the load average is 30 and top
shows a pile of Apaches using up CPU, then it's PHP handling a complex
page request. (No, I still haven't made it PHP via fcgid.)
* Sometimes it isn't, e.g. this afternoon when the site was running
like a slug and load average was 0.8 with nothing amiss in top.
* The squids don't show an unusual rate of hits on the site.
* We have plenty of memory free - about 4GB on the main box is just
sitting in file cache.
* php_errors.log only shows up some processes timing out their 30
seconds (which would be the 502s).
So where would I start looking to work out what's going on?
- d.
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
--
Dave Humphrey -- dave(a)uesp.net
Founder/Server Admin of the Unofficial Elder Scrolls Pages --
www.uesp.net