The problem only occurs on the production systems, I'm not able to
replicate it on my test environment, and beyond that, my test environment
has already been upgraded to MW 1.25 while production is still 1.24.
That said, even under normal conditions I get pages that take so long to
load that they lead to timeouts. I'm trying to analyze which pages or types
of pages take so long; they tend to be ones that are very image heavy,
typically pages for crafting materials and recipes, e.g.
. That page will usually load
but a similar one that won't is
.
On Thu, Nov 19, 2015 at 10:11 AM, Dave Humphrey <dave(a)uesp.net> wrote:
Don't enable debug/profiling on a server that gets
public hits, at least in
general. Profiling slows down the page load and if you're already
borderline it will make a bad situation worse. That and you'll generate so
much profiling output it will be hard to go through and make anything
meaningful from it.
Put it on a unused server or copy the wiki installation to a new folder and
do it from there. Then you can control the page loads being profiled.
On 19 November 2015 at 12:52, Justin Lloyd <jlloyd.wiki(a)gmail.com> wrote:
So I confirmed that $wgTmpDirectory is defaulting
to /tmp on my systems,
so
that's not the problem. As for profiling, I
enabled it on one of the four
web servers and that appears to actually trigger the problem, or a very
similar one. The Apache processes quickly climbed to their MaxClients
limit
of 100 and just stayed there, forcing me to
restart Apache after first
commenting out the profiling settings in LocalSettings.php, where
$wgProfileLimit was set to 2.
On Wed, Nov 18, 2015 at 1:00 PM, Dave Humphrey <dave(a)uesp.net> wrote:
> Another note if that doesn't happen to work for you: We discovered the
> source of our issue by enabling profiling/debugging on the wiki (on a
> non-public server/install so you can control the page loads and
profiling
outputs).
You can see pretty quickly what areas/code are taking the
longest
and begin to dig down. Eventually I added custom
profile sections to
further narrow down the issue to a single "open" call.
On 18 November 2015 at 15:57, Justin Lloyd <jlloyd.wiki(a)gmail.com>
wrote:
> Intriguing! I'll definitely investigate this and report back. Thanks!
:)
> >
> > Justin
> >
> >
> > On Wed, Nov 18, 2015 at 12:55 PM, Dave Humphrey <dave(a)uesp.net>
wrote:
> >
> > > That actually sounds very close to an issue we had after upgrading
to
> > 1.22
> > > earlier this year. Pages with a lot of images/thumbnails took a
long
> time
> > > to render (100s of images took over a minute). We eventually
tracked
it
> down
to having the default $wgTmpDirectory pointing to the
upload/images
> > directory which was on a NFS share. Each file creation (or access?)
on
a
> > NFS share takes a fixed 50ms so you multiply that by multiple
accesses
> > and
> > > you get the delay.
> > >
> > > We fixed it by simply changing $wgTmpDirectory to point to a path
on
> the
> > > local fixed drive. Since your setup sounds similar to ours it may
be
>
worth
> > trying it out. If this is indeed your issue you can force a "slow"
page
> > > load by purging a page with a lot of images on it. Test it before
and
> > after
> > > the change.
> > >
> > > On 18 November 2015 at 15:42, Justin Lloyd <jlloyd.wiki(a)gmail.com>
> > wrote:
> > >
> > > > My speculation is that it's image heavy pages, not one specific
php
>
page.
> > > This is for the Guild Wars 2 wikis, specifically the English wiki
at
> > > >
wiki.guildwars2.com. The Game Updates page used to be
problematic,
> > > causing
> > > > a massive backlog because a game update or hotfix was released
and
> > people
> > > > hammered that page to see the list of changes. Our main editors
> changed
> > > how
> > > > the page works, primarily breaking it up into subpages that DPL
> > > integrates
> > > > the most recent of which into the main page, but also changing
the
> >
> templates that were used for displaying trait and skill icons.
> > >
> > > Further analysis of the Apache logs, after adding the %D field to
the
> > log
> > > > format, showed a lot of pages taking sometimes minutes to
complete,
> > which
> > > > ultimately result in 502s. The ones that appear to take the
longest
are
> > > those with a lot of these thumbnail images, which is why I think
it's
> > > still
> > > > a template issue, but it would be really nice to be able to back
up
> > that
> > > > hypothesis with actual data from process diagnostics, stack
traces,
>
etc.
> > >
> > > (I really miss DTrace on Solaris. I know it exists for Linux but
I'm
wary
> > of trying it, especially on production systems. Anyone here have
> experience
> > with it?)
> >
> >
> > On Wed, Nov 18, 2015 at 12:25 PM, Dave Humphrey <dave(a)uesp.net>
wrote:
> >
> > > My usual strategy is to check server-status and if I need more
detail
> go
> > > with debugging tools (gdp etc..., see
> > >
> > >
> >
>
http://serverfault.com/questions/487530/find-out-what-high-cpu-usage-apache…
> > > > > ).
> > > > > It seems you have done this, however, and I'm wondering why
you
> > haven't
> > > > at
> > > > > least been able to narrow down the issue? You should at least
be
able
> > to
> > > > know which PHP file is locking up/crashing or the rough
area/cause?
> > > >
> > > > Once you know roughly where it is you can add temporary PHP
logging
> > > > commands in the code to help
narrow down the issue further. If
you
> also
> > > > know roughly where/how the lockups are you can try
> testing/replicating
> > > the
> > > > behavior to get a bit more control on it.
> > > >
> > > > On 18 November 2015 at 14:59, Justin Lloyd <
jlloyd.wiki(a)gmail.com>
> > > wrote:
> > > >
> > > > > Hey everyone,
> > > > >
> > > > > Yesterday I posted this to /r/mediawiki (
https://redd.it/3t2apu)
> > and
> > > > > > cross-posted to /r/apache as well, but unfortunately
I've
still
not
> > > > > received any feedback other than the one request here for
> > clarification
> > > > and
> > > > > a couple of suggestions on reddit that I'd already covered
in
the
> > > post.
> > > > > >
> > > > > > It's possible no one has any suggestions for me
regarding
this
>
issue
> > > (it
> > > > is
> > > > > a somewhat complex application stack that could be requiring
> > > > configuration
> > > > > and/or tuning in multiple places, for example), but given how
> severe
> > > of a
> > > > > problem this is for my production sites, I wanted to bump it
once
> in
> > > > hopes
> > > > > of possibly getting at least some pointers of things to
consider
> > that I
> > > > may
> > > > > not have already, especially with respect to diagnostics I
could
> > > > perform
> > > > > on
> > > > > > the live web servers beyond just server-status and the
collectd
> > > apache
> > > > > > plugin (which is basically the same thing), for example.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Nov 12, 2015 at 8:02 AM, Justin Lloyd <
> > jlloyd.wiki(a)gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Marcin,
> > > > > > >
> > > > > > > It's the biggest and most heavily trafficked of
our wikis
> because
> > > its
> > > > > the
> > > > > > > English-language version of the wiki. We also have
German,
> > French,
> > > > and
> > > > > > > Spanish, but the English-speaking community is by far
the
> largest
> > > and
> > > > > > most
> > > > > > > active. There are some tiny configuration differences
between
the
> > wikis
> > > > > (e.g. the value of $wgJobRunRate, the specific extensions
loaded)
> > but
> > > > > > nothing very significant I don't believe.
> > > > > >
> > > > > > I should also add that all four of these wikis (we have a
5th,
for
> 7
> > > > > total, not 6 as I'd originally said) also use Semantic
MediaWiki
> > > > > > extensively. I believe the other three wikis would run into
the
> > same
> > > > > > problem if they had same amount of traffic as the English
one.
> > > However,
> > > > > > since they all are vhosts within the same Apache instances,
the
> > > > English
> > > > > > > one's problems affect all of them.
> > > > > > >
> > > > > > > Justin
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Nov 12, 2015 at 1:42 AM, Marcin Cieslak <
> > saper(a)saper.info>
> > > > > > wrote:
> > > > > > >
> > > > > > >> On 2015-11-12, Justin Lloyd
<jlloyd.wiki(a)gmail.com>
wrote:
> > > > >> > * Six wikis are
configured as Vhosts in Apache, load
balanced
> > by a
> > > > > >> separate
> > > > > >> > set of front-end servers, where two of the wikis
are for
> private
> > > > > >> internal
> > > > > >> > use and the other four are public, though the
traffic to
one
> of
> > > the
> > > > > >> public
> > > > > >> > wikis dwarfs the rest and it's the wiki giving
me
problems.
> >
> > >>
> > > > >> (...)
> > > > >>
> > > > >> > I'm mainly looking right now for how to
troubleshoot the
stuck
> > > > > >> processes,
> > > > > >> > but any advice regarding this architecture is also
welcome,
> > as I
> > > > > feel
> > > > > > it
> > > > > > >> > could use some improvement but I'm not
sure how just
yet.
> >
> > > >>
> > > > > >> The question that immediately comes to my mind before I
start
> > > > digging
> > > > > > >> any further - how is the wiki making problems
special? Is
it
> > just
> > > > > > getting
> > > > > > >> most of the traffic (it is the "most
interesting" one) or
is
its
> > > > >> configuration
slightly different?
> > > > >>
> > > > >> Marcin Cieślak
> > > > >>
https://www.mediawiki.org/wiki/User:Saper
> > > > >>
> > > > >>
> > > > >> _______________________________________________
> > > > >> MediaWiki-l mailing list
> > > > >> To unsubscribe, go to:
> > > > >>
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
> > > > >>
> > > > >
> > > > >
> > > > _______________________________________________
> > > > MediaWiki-l mailing list
> > > > To unsubscribe, go to:
> > > >
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
> > > >
> > >
> > >
> > >
> > > --
> > > Dave Humphrey -- dave(a)uesp.net
> > > Founder/Server Admin of the Unofficial Elder Scrolls Pages --
> >
www.uesp.net
> > >
www.viud.net - Building the world's toughest USB drive
> > > _______________________________________________
> > > MediaWiki-l mailing list
> > > To unsubscribe, go to:
> > >
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
> > >
> > _______________________________________________
> > MediaWiki-l mailing list
> > To unsubscribe, go to:
> >
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
> >
>
>
>
> --
> Dave Humphrey -- dave(a)uesp.net
> Founder/Server Admin of the Unofficial Elder Scrolls Pages --
www.uesp.net
www.viud.net - Building the world's toughest
USB drive
_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
--
Dave Humphrey -- dave(a)uesp.net
Founder/Server Admin of the Unofficial Elder Scrolls Pages --
www.uesp.net
www.viud.net - Building the world's toughest
USB drive
_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
--
Dave Humphrey -- dave(a)uesp.net
Founder/Server Admin of the Unofficial Elder Scrolls Pages --
www.uesp.net
www.viud.net - Building the world's toughest USB drive
_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l