[Xmldatadumps-admin-l] New enwiki-Dump
Tomasz Finc
tfinc at wikimedia.org
Thu Apr 8 22:04:03 UTC 2010
Tomasz Finc wrote:
> Andreas Meier wrote:
>> Hello,
>>
>> a new enwiki-dump is ready, but izs soze is only 178.7 GB, the dump
>> before had a size of 280.3 GB. Do you have changed the compression or is
>> the new dump corrupted?
>
> The actual bz2 archive is not corrupt as it unarchives just fine.
>
> Looking at both the number of pages processed and revision count it's
> exactly where it should be. But clearly the file size is way smaller as
> I'm pretty sure we didn't delete 1/3 of en wiki.
>
> There haven't been any compression changes to account for this.
>
> What we need to happen is for someone to start comparing individual
> revisions to account for any data loss.
>
> --tomasz
>
> _______________________________________________
> Xmldatadumps-admin-l mailing list
> Xmldatadumps-admin-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l
The only change that I can think of in the last month was the removal of
locking tables for the run.
--tomasz
More information about the Xmldatadumps-admin-l
mailing list