[Xmldatadumps-admin-l] Archive -20100312- [31.9 GB] many revisions between 2005-01-10T and 2005-05-14 have empty text

Dmitry Chichkov dchichkov at gmail.com
Fri May 14 03:31:19 UTC 2010


Here is an instruction onto how you can check that:
1) extract first ~700M from the archive (takes 1-10 seconds)
# 7z e enwiki-20100130-pages-meta-history.xml.7z
# Ctrl-C

2) open the .xml file and search for 9450068
You'll see that the revision text is missing.

3) Check that this revision is in fact not an empty one:
http://en.wikipedia.org/w/index.php?oldid=9450068

4) Scroll down in the .xml file and see more empty and regular revisions.

-- Dmitry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/xmldatadumps-admin-l/attachments/20100513/208b1c7b/attachment.htm 


More information about the Xmldatadumps-admin-l mailing list