[Xmldatadumps-admin-l] New enwiki-Dump

Tomasz Finc tfinc at wikimedia.org
Thu Apr 8 21:43:42 UTC 2010


Andreas Meier wrote:
> Hello,
> 
> a new enwiki-dump is ready, but izs soze is only 178.7 GB, the dump 
> before had a size of 280.3 GB. Do you have changed the compression or is 
> the new dump corrupted?

The actual bz2 archive is not corrupt as it unarchives just fine.

Looking at both the number of pages processed and revision count it's 
exactly where it should be. But clearly the file size is way smaller as 
I'm pretty sure we didn't delete 1/3 of en wiki.

There haven't been any compression changes to account for this.

What we need to happen is for someone to start comparing individual 
revisions to account for any data loss.

--tomasz



More information about the Xmldatadumps-admin-l mailing list