[Xmldatadumps-admin-l] FYI: comparison between enwiki-20100130-pages-meta-history.xml.7z and enwiki-20100312-pages-meta-history.xml.7z

Dmitry Chichkov dchichkov at gmail.com
Tue May 18 06:22:07 UTC 2010


I've tried filtering and plotting empty text revisions using the following
criteria: comment starts on '/*'  (section edits) AND not an IP edit;
The idea is that generally section edits do not result in the deletion of
the complete article text and registered users tend to vandalize less.
Consequently we can somewhat see what revisions text were missed due to
backup.

Resulting plots are attached for both [enwiki-20100130 31.9 GB] and
[enwiki-20100312 15.8 GB] files.

If anybody is interested in the raw filtered data here's a link to the
zipped .csv(s):
http://76.126.237.67/tmp/missing.revisions.enwiki-20100xx.7z
The .csv files have the following format: 'pageid, revisionid, unixtime,
pagetitle'.

-- Cheers, Dmitry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/xmldatadumps-admin-l/attachments/20100517/15a9bed2/attachment-0001.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Graph1.png
Type: image/png
Size: 68183 bytes
Desc: not available
Url : http://lists.wikimedia.org/pipermail/xmldatadumps-admin-l/attachments/20100517/15a9bed2/attachment-0002.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Graph2.png
Type: image/png
Size: 60752 bytes
Desc: not available
Url : http://lists.wikimedia.org/pipermail/xmldatadumps-admin-l/attachments/20100517/15a9bed2/attachment-0003.png 


More information about the Xmldatadumps-admin-l mailing list