[Mediawiki-l] Re: Conserving Storage Space and Removing History
Ashar Voultoiz
thoane at altern.org
Fri Jun 25 16:45:55 UTC 2004
AlphabetDP at aol.com wrote:
> In a message dated 6/21/2004 5:02:01 PM Eastern Standard Time,
> brion at pobox.com writes:
>
>>Note that currently we don't have diff-based storage; when you make a
>>change to a page the entire previous revision is stored in whole.
>>(Consider enabling $wgCompressOld if you have zlib support in PHP; this
>>will reduce old text requirements by roughly half.)
>>
>>-- brion vibber (brion @ pobox.com)
>
<snip>
> We had attempted to research the wiki's overhead requirements in making a
> judgment as to whether or not to buy more disk space from our provider. During
> the investigation of overhead storage requirements, we used the 'wikipedia'
> statistics and charts on space. It never occurred to us that 'wikipedia' was
> storing full copies of all versions of an article based on the 590MB May 22, 2004
> number and considering the high number of articles the db had. We must have
> been reading the wrong statistics.
You might have looked at the cur dump wich only hold the lastest
revision, not holding the old revisions. Compressed the sql dumps size
for the english wikipedia are:
cur : 269 MB
old : 7608 MB
The sizes of all wikipedias databases are available at:
http://www.wikipedia.org/wikistats/EN/TablesDatabaseSize.htm
In fact they are bigger :o)
> Do the 'wikipedia' administrators remove history from their wiki in order to
> preserve space? If so, how is this done? Is there some sort of 'export only
> the lastest version of each article, etc.' option, clear the db, and then import
> the lastest version back?
There is no such option, one might want to drop olders entries in the
"old" tables but you will then lost histories. The only thing deleted in
wikipedia databases are new articles which are vandalism / incorrect
data. They are dropped from the "cur" table but are still in "old" (as
far as I know).
> Our administrator has set the "$wgCompressRevisions = true;" since your
> message (above) -- will that take care of only the revisions since the flag was
> turned on or will there be compression of the previous revisions as well?
I think it will be only for revisions made after the flag got set, I am
not sure there is a ./maintenance/ script to compress revisions made
before the switch.
Hopefully the new diff based history will save lot of space.
--
Ashar Voultoiz
More information about the MediaWiki-l
mailing list