[Xmldatadumps-admin-l] New enwiki-Dump

Tomasz Finc tfinc at wikimedia.org
Thu Apr 8 22:04:03 UTC 2010


Tomasz Finc wrote:
> Andreas Meier wrote:
>> Hello,
>>
>> a new enwiki-dump is ready, but izs soze is only 178.7 GB, the dump 
>> before had a size of 280.3 GB. Do you have changed the compression or is 
>> the new dump corrupted?
> 
> The actual bz2 archive is not corrupt as it unarchives just fine.
> 
> Looking at both the number of pages processed and revision count it's 
> exactly where it should be. But clearly the file size is way smaller as 
> I'm pretty sure we didn't delete 1/3 of en wiki.
> 
> There haven't been any compression changes to account for this.
> 
> What we need to happen is for someone to start comparing individual 
> revisions to account for any data loss.
> 
> --tomasz
> 
> _______________________________________________
> Xmldatadumps-admin-l mailing list
> Xmldatadumps-admin-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l

The only change that I can think of in the last month was the removal of 
  locking tables for the run.

--tomasz



More information about the Xmldatadumps-admin-l mailing list