[Foundation-l] Re: Hosting scans of the 1911 Britannica on Wikimedia

Tim Starling t.starling at physics.unimelb.edu.au
Wed Nov 9 22:26:06 UTC 2005


I wrote:
> When Brian came on to IRC and asked us "What is the best way to upload
> 30,000 images requiring 6 GB to commons?" the reaction from Brion and I was
> a groan. The hardware requirements for commons are rapidly increasing, and
> uploading and storing such content in MediaWiki is inefficient and
> non-portable. If we had them in a separate directory on a separate domain,
> we could copy them from server to server, make tarballs, run batch
> conversion jobs -- all with a minimal amount of programming and system
> administration work. And it wouldn't require writing a bot to create 30,000
> index pages, we could just write a hundred lines of PHP to index the whole
> lot. The collection will be easier to use and more reliable, and it will be
> easy to maintain and update the index pages.
> 
> All of the navigation text, the headers and footers, could be editable in
> wiki fashion. You could let anyone change the header that will be displayed
> on 30,000 pages, with no server strain whatsoever. This is in stark contrast
> to the system requirements of templates which are used on large numbers of
> wiki pages.
> 
> Wikisource has suffered so far due to a lack of specialised software. This
> kind of initiative could see it become more usable generally.

Come to think of it, I could probably do it as a MediaWiki extension, and
embed this content in en.wikisource.org. You'd get all of the same features,
but it would also appear to be integrated with the wiki. You wouldn't be
able to edit the page images, but I don't think that's a desirable property
anyway. It would be easy for someone to download the whole collection, run a
processing script (say, automated correction of the scanning quality), and
then upload the whole new collection and incorporate it into the wiki. Easy
as in no bots, no screen scrapers, no server strain, just a tarball download
and a tarball upload.

-- Tim Starling




More information about the foundation-l mailing list