Neil Harris wrote:
After a very informative exchange with Tim Starling,
I've thought a bit
more about my proposal last night about making Wikipedia cacheable,
which in the light of day seems excessively complex. Here's a simpler
version:
[snip]
(a) pages for logged-in users
(b) pages for anon users who have a pending message
(c) pages with auto-generated dynamic content (Special: pages, and any
others with similar behaviour)
[snip]
I'd be interested to hear what others think. Is
there an obvious flaw in
my reasoning? Is this worth a try?
I agree that this would be better, but implementing this has one
problem: Squid 2.5 currently does not differentiate cache-control
headers between its role as an accelerator/surrogate, and as a normal
proxy cache. For this to work, Mediawiki should be able to send *two*
kinds of cache control headers: one for our squids, and one for "the
others out there".
Squid 3 (if it'll ever see the light of day) will support this through a
X-Surrogate-Control header, but until then we're stuck with a hack:
Mediawiki sends out Cache-Control headers for our squids, which then
then try to recognize normal content pages and replace the Cache-Control
header on those responses to:
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
We could of course do s/private/public/ here, but there may be pages
that are private/user specific that Squid cannot really detect reliably,
simply because it doesn't have the same detailed information that
Mediawiki has. Therefor, 'private' is the safest way to go for now.
Another option would be to try to signal Squid when it should and when
it should not replace the Cache-Control header sent by Mediawiki.
However, Squid is not very flexible in this area, so it'd be a dirty hack.
Modifying Squid is of course possible as well, but requires a bit more
effort. Additionally, itt has proven to be a nuisance to maintain a
patched up Squid on our cluster, so we'd like to try to keep the set of
patches against upstream as small as possible.
--
Mark
mark(a)nedworks.org