AlphabetDP(a)aol.com wrote:
In a message dated 6/21/2004 5:02:01 PM Eastern
Standard Time,
brion(a)pobox.com writes:
Note that currently we don't have diff-based
storage; when you make a
change to a page the entire previous revision is stored in whole.
(Consider enabling $wgCompressOld if you have zlib support in PHP; this
will reduce old text requirements by roughly half.)
-- brion vibber (brion @
pobox.com)
<snip>
We had attempted to research the wiki's overhead
requirements in making a
judgment as to whether or not to buy more disk space from our provider. During
the investigation of overhead storage requirements, we used the 'wikipedia'
statistics and charts on space. It never occurred to us that 'wikipedia' was
storing full copies of all versions of an article based on the 590MB May 22, 2004
number and considering the high number of articles the db had. We must have
been reading the wrong statistics.
You might have looked at the cur dump wich only hold the lastest
revision, not holding the old revisions. Compressed the sql dumps size
for the english wikipedia are:
cur : 269 MB
old : 7608 MB
The sizes of all wikipedias databases are available at:
http://www.wikipedia.org/wikistats/EN/TablesDatabaseSize.htm
In fact they are bigger :o)
Do the 'wikipedia' administrators remove
history from their wiki in order to
preserve space? If so, how is this done? Is there some sort of 'export only
the lastest version of each article, etc.' option, clear the db, and then import
the lastest version back?
There is no such option, one might want to drop olders entries in the
"old" tables but you will then lost histories. The only thing deleted in
wikipedia databases are new articles which are vandalism / incorrect
data. They are dropped from the "cur" table but are still in "old" (as
far as I know).
Our administrator has set the
"$wgCompressRevisions = true;" since your
message (above) -- will that take care of only the revisions since the flag was
turned on or will there be compression of the previous revisions as well?
I think it will be only for revisions made after the flag got set, I am
not sure there is a ./maintenance/ script to compress revisions made
before the switch.
Hopefully the new diff based history will save lot of space.
--
Ashar Voultoiz