Hi!
Once there was the size of all Wikipedia database dumps at
download.wikipedia.org. In March 9, 2005 I noticed 50 Gigabytes
compressed (15 for just current). I wonder what the actual numbers are.
I assume the change to XML does not make a big difference because data
is compressed but it don't remember if it was compressed with gip oder
bzip2. Anyway a total number would be interesting to give an impression
of how much information we collect. Can anyone with access to the server
please run a simple shell script to get the size you need if you wanted
to download all wikipedias data?
Thanks and greetings!
Jakob
P.S: by the way 20050713_pages_full.xml.bz2 (29.9G) seems to be the
newest full dump of english wikipedia but I bet you don't need a
Terabyte for all compressed full dumps - yet. I found the first plans
for RAID systems with several Terabytes here:
http://meta.wikimedia.org/wiki/Wikimedia_servers/hardware_orders/wishlist