I made a quick runthrough of the upload directories to see how big the total
file set for each wiki is, with an eye towards getting bulk dumps of uploads
ready again.
A pretty pie chart and raw data are here:
http://meta.wikimedia.org/wiki/Upload_distribution%2C_June_2006
All together, current-versions of files without thumbnails total about 372
gigabytes. Commons makes up the vast majority, with over 245 gigs. English
Wikipedia squeaks in nearly another 60 gigs, German Wikipedia then just shy of
20 gb, then they start rapidly dropping off from there.
Giant tarballs are a rather unwieldy way to distribute file dumps at the larger
sizes: they require 2x the disk space (for staging complete and in-progress
builds) and of course if anyone downloads them it all comes out of our central
bandwidth.
Wegge's doing some testing with BitTorrent; it might or might not be feasible to
build torrent files that simply reference all the individual files, so we can
use hardlinks to maintain a snapshot without eating up the full disk space on
the server. This also avoids the need to keep or extract a large archive file
for the downloader.
Given the number of files (about 650k in Commons now) and their wild and crazy
filenames this might not be totally feasible, but we can hope.
-- brion vibber (brion @
pobox.com)