[Mediawiki-l] Re: Conserving Storage Space and Removing History

Ashar Voultoiz thoane at altern.org
Fri Jun 25 16:45:55 UTC 2004


AlphabetDP at aol.com wrote:
> In a message dated 6/21/2004 5:02:01 PM Eastern Standard Time, 
> brion at pobox.com writes:
> 
>>Note that currently we don't have diff-based storage; when you make a 
>>change to a page the entire previous revision is stored in whole. 
>>(Consider enabling $wgCompressOld if you have zlib support in PHP; this 
>>will reduce old text requirements by roughly half.)
>>
>>-- brion vibber (brion @ pobox.com)
> 
<snip>
> We had attempted to research the wiki's overhead requirements in making a 
> judgment as to whether or not to buy more disk space from our provider. During 
> the investigation of overhead storage requirements, we used the 'wikipedia' 
> statistics and charts on space. It never occurred to us that 'wikipedia' was 
> storing full copies of all versions of an article based on the 590MB May 22, 2004 
> number and considering the high number of articles the db had. We must have 
> been reading the wrong statistics.

You might have looked at the cur dump wich only hold the lastest 
revision, not holding the old revisions. Compressed the sql dumps size 
for the english wikipedia are:

  cur : 269 MB
  old : 7608 MB

The sizes of all wikipedias databases are available at:
   http://www.wikipedia.org/wikistats/EN/TablesDatabaseSize.htm

In fact they are bigger :o)

> Do the 'wikipedia' administrators remove history from their wiki in order to 
> preserve space? If so, how is this done? Is there some sort of 'export only 
> the lastest version of each article, etc.' option, clear the db, and then import 
> the lastest version back?

There is no such option, one might want to drop olders entries in the 
"old" tables but you will then lost histories. The only thing deleted in 
wikipedia databases are new articles which are vandalism / incorrect 
data. They are dropped from the "cur" table but are still in "old" (as 
far as I know).

> Our administrator has set the "$wgCompressRevisions = true;" since your 
> message (above) -- will that take care of only the revisions since the flag was 
> turned on or will there be compression of the previous revisions as well?

I think it will be only for revisions made after the flag got set, I am 
not sure there is a ./maintenance/ script to compress revisions made 
before the switch.


Hopefully the new diff based history will save lot of space.

-- 
Ashar Voultoiz




More information about the MediaWiki-l mailing list