[Wikipedia-l] Downloading Wikipedia (was: please help me)

jol at osoft.biz jol at osoft.biz
Fri Nov 14 19:03:39 UTC 2003


Dear Sir/Madam,

> Brion Vibber wrote:
>
>> Please do *not* use programs such as Webstripper. Especially on
>> dynamic sites like Wikipedia, they create a huge amount of load on the
>> servers by downloading thousands upon thousands of pages extremely
>> rapidly. [...] Use a web browser like everybody else, and please stop
>> attacking the web sites that you enjoy.
>
> Don't we have a nice compressed static HTML tree by now that we could
> offer people under the "Download Wikipedia" heading on the main page?

I haven't looked at the wikipedia software but I expect it's two tier; the
wiki markup is stored in a database and converted to HTML with every
request.  This is not very efficient.

I designed a similar piece of software but with three tiers.  The wiki
markup is converted to XML before being stored in the database, and back
again for editing.  This may be time consuming, but it doesn't happen very
often.  It also allows me to change my wiki markup language (though I
don't use that term) if necessary.

When a user requests a page, it is retrieved as XML from the database and
transformed to HTML by an XSLT program.  The important point is that most
users have a browser that can do the XSL transformation.  So they can
download as much as they want and read it offline, with the XSLT program
in their browser cache.  The download involves a couple of table lookups
per page requested from the server and little processing.  (If the user
doesn't have a modern browser then they can elect to have transformation
done by the server, but that's another story.)

Just an idea...

Regards,
John O'Leary.






More information about the Wikipedia-l mailing list