From: "Mark Christensen"
<mchristensen(a)humantech.com>
To: <wikitech-l(a)wikipedia.org>
Reply-To: wikitech-l(a)wikipedia.org
A quick start might be to temporarily disable all
checking
of links, and see if that helps much.
This seems to be a helpful suggestion. Without profiling, it's hard
to
tell where the bottleneck is, but I think link checking is a good
guess.
Thanks very much!! I think measuring without link-checking would
great, it would certainly answer many questions. I don't have a
machine I can test on, sadly; does someone else?
It's worth noting that link-checking not only causes
additional processing - if link-checking is disabled, and user
formatting is limited, many OTHER optimizations become easy.
In particular, caching becomes really easy if article text
doesn't depend on other state (i.e., doesn't require link-checking and
processing to support fancy user options). For example, without
link-checking, you don't have to follow lists to invalidate
"related" caches. The most effective optimization is to do
nothing at all :-). In-filesystem caches of HTML fragments would
make sense in such a situation, and Linux's sendfile()
could do a rather impressive job of improving performance
when sending cached article text. Allowing users to select
which stylesheet to blast back at them would give users a limited
amount of control, but seriously improve performance.
Link-checking is that it's not as useful
as you'd wish, anyway. After all, it only identifies
existence. An article with only a tiny amount of content appears
"complete" to the link-check, but clearly you'd want people to
work on that article too! If disabling link-checking (and the
optimizations it makes complicated) turns out to seriously improve
performance, then I think it's an obvious capability to disable
(at least as a configurable option).
My two cents, hope they help.