[Foundation-l] Historical wikipedia dumps

Anthony wikimail at inbox.org
Thu Aug 28 12:12:38 UTC 2008


Client-side might be ideal but also a lot harder to make.  Something
live-mirrorish would be fairly easy, but would of course violate the "no
live mirrors" rule.  To go completely server-side would require a *lot* of
disk space, and/or some tricky db compression which would eat up lots of CPU
cycles.  Plus you'd need to find or make a full history dump.  I guess if
you've got the bandwidth, disk space, and/or CPU cycles to spare you could
relatively easily scrape up your own full history dump, though.

A mediawiki extension probably wouldn't be too hard, but I don't know
mediawiki well enough to be volunteering.  For performance reasons it might
require a new db table/column/index.  I don't think mediawiki tables are
optimized for looking up the latest version of a page on a particular date.

I might try hacking up a live-mirrorish version next time I get enough free
time.  Lets see - I'd have to find the right templates, article, categories,
and images, presumably working from the stub dump, and then merge them all
together.  Anything else?  Historical skins would be nice but unnecessary,
historical parsing algorithms would be cool but probably overkill.  Anyone
have a tool to recursively parse templates?  I always get stuck there trying
to make a perfect parser.  On a similar note, is there a standalone parser
yet, or would I have to import it all into a database?

Seems neat, though.  One thing that comes to mind is checking out various
articles on the days on and around 9/11/01.

Anthony

On Thu, Aug 28, 2008 at 12:50 AM, Ben Yates <ben.louis.yates at gmail.com>wrote:

> What would be ideal is a client-side wiki reader that could load past
> revisions at runtime.
>
> On Mon, Aug 25, 2008 at 12:26 PM, Ziko van Dijk <zvandijk at googlemail.com>
> wrote:
> > Oh, this nostalgia wp still exists, yes.
> >
> > I thought about a tool or a user surface where I simply type
> > "2003-01-01" (as an example) and Wikipedia will show me the articles
> > from that point of time. I understand that there might be problems
> > with deleted images, merged articles, right. But it would still be
> > interesting enough, certainly the older Wikipedia grows. I do not know
> > so much about technical matters, but I can not imagine that such a
> > tool would be very complicated. (?)
> >
> > Greetings
> > Ziko
>


More information about the foundation-l mailing list