Aaron,
I wonder why you were surprised by the high number of what I call false positives in the
sample then, if you suspected as much.
Further, false positives as I described them are indeed a very good way to think about the
accuracy problem, if you put some thought in what you define as accuracy and as false
positives. The definition that we use and which can be found if you read the paper: the
approach aims at defining "who is reverted by whom" i.e. we think about reverts
as antagonistic relations between editors and want to track the correct ties between
these. Indicating an antagonistic relationship between two editors if there is actually
none can clearly be deemed a "false positive" in the detection task. Equally
important: even if you are only interested in which revision was reverted, this is often
not done correctly by the MD5 hash method, as you can clearly see from the examples I
gave.
Regarding your "error cases":
1. (reverted_id, reverting_id): The desired case. The reverted revision appears to have
has contributions discarded by the reverting revision
exactly
1. (reverted_id, X): Suboptimal, but useful case. The reverted revision was indeed
reverted, but associated reverting revision was not the one to discard the contributions.
Sure, if you are only interested in WHO was reverted, but not by whom, it might be useful.
The thing is that the MD5 method gets even the WHO wrong in a couple of cases, though
more often in the variation (X, reverting_ID).
If want to know both sides of the revert, this is definitely not "suboptimal",
but simply a false positive for the relation between the two revisions.
1. (X, X): False positive. The reverted revision was not actually reverted.
Like #2., this is also a false positive for the relations between the revisions, while
also detecting wrongly which revision was reverted.
I had thought you were referring to case #3, when you were generally referring to case #2.
Is that right?
No, that is not right at all, I refer to both cases #2. and #3, as you can clearly see
from the examples. I'll try to make it more understandable: in my hand-made example,
look at the edit #4 (deleting"Apple") and the edit #5 (deleting
"Banana") --> edit #5 didn't undo/revert edit #4 and edit #4 was not
reverted by anyone (the deletion of "Apple" was not undone). If you show this
pair of edits to editors, as we did, they say this is not a revert in the sense of the
Wikipedia definition. Consequently, this is a #3. false positive, as edit #4 was not
reverted. This case can as well be found frequently in the examples.
Fabian
On Jul 6, 2012, at 2:54 AM, Aaron Halfaker wrote:
I suspected as much. It looks like "false positive" isn't a very good way
to think about the accuracy problem. It looks to me like there are three states of
interest. Assuming a pair of revision_ids representing the information contained in a
"revert": (reverted_id, reverting_id)
1. (reverted_id, reverting_id): The desired case. The reverted revision appears to have
has contributions discarded by the reverting revision
2. (reverted_id, X): Suboptimal, but useful case. The reverted revision was indeed
reverted, but associated reverting revision was not the one to discard the contributions.
3. (X, X): False positive. The reverted revision was not actually reverted.
I had thought you were referring to case #3, when you were generally referring to case #2.
Is that right?
-Aaron
On Mon, Jul 2, 2012 at 1:23 PM, Floeck, Fabian (AIFB)
<fabian.floeck@kit.edu<mailto:fabian.floeck@kit.edu>> wrote:
First of all, thanks a lot for your questions and remarks. (btw Mako: nice panel talk
yesterday at WPAC12)
tl;dr: scroll all the way down for examples
Questions by Mako:
1. Are you limiting this to edits that are separated by an revisions with
identical hashes by only one edit? --> I'm not quite sure what you mean but we do
not limit this to specific edits, only exception being: Both methods were tested with a
limit of going back max. 20 revisions to look for reverts.
2.And are you sure your human coders aren't just relying on edit summaries? -->
they couldn't see the edit summaries, via our experimental setup.
3. HASH-A => HASH-B => HASH-A no revert? --> (assuming you mean HASH B is only
one revision/edit): this is ALWAYS a revert by A targeting B, in both methods and was
always evaluated as such by the users.
Before I give examples, let me just remind you that this is only a sample, so first of
all, of course it is not inferred statistically that these 37% that I mentioned
necessarily appear like this in general in the exact same way. Secondly, this number is
generated when you assume that 80% of all participants uttered agreement to an edit pair
being a full revert, i.e. of course there were cases in the sample where people did
disagree and some cases where even the majority was voting for it to be a full revert
while being detected by MD5, just not over 79%. I chose this threshold to make the
differences clear, I could have also selected some other arbitrary value. That is exactly
why we did not put it in the paper. Because the analysis in the paper is a much better
ground for making statistical inferences about the data that is the "basic
population" for this analysis.
Now, let me give you some examples for false positives generated by the MD5 hash method:
1. One self-generated example (inspired by observations) is given in the paper (almost
identically):
RevID # RevContent (after edit)# Edit # Hash
1 # Peanut # +Peanut # Hash1
2 # Peanut Apple # +Apple # Hash2
3 # Peanut Apple Banana # +Banana # Hash3
4 # Peanut Banana # - Apple # Hash4
5 # Peanut # -Banana # Hash1
MD5 assigns 5 as reverting edit of 2,3,4
DIFF assigns 5 as reverting edit of 3, 4 as reverting edit of 2
false positive in this case (according to Wikipedia definition) for MD5: 5 reverting 4 and
2
(as 4 is unrelated to what 5 is doing and 2's contribution is removed already, it can
thus not be undone anymore by 5)
2. "Real-life" examples rated as false positives in the user evaluation:
When you asked me for the examples, I started digging them up from the data sample that
was used and in fact realized that many false positives of the MD5 method are related to
self-reverts. As this is no issue for our data extraction aims (we want to have
self-reverts in the results as well) and was not considered when just randomly drawing
edit pairs from the two methods' results, we didn't discuss this in the paper. If
you, of course, do not consider self-reverts to be reverts in the Wikipedia definition
sense, they could be filtered out by collapsing subsequent edits of one editor before
running the revert analysis with the MD5 method. That would reduce the number of false
positives notably I assume. I will certainly look into that.
If you don't collapse these edits, however (which is not regularly done before
reporting/using revert detection results), the number of false positives will be quite
high, as the edits-to-be-collapsed (and prone to being misinterpreted) appear quite often
and their span can at times be considerably large. And of course there are cases not
related to self-reverts.
I tried to select examples representative for the sample, which received very little or no
votes to be full reverts (as detected by MD5):
Example A
detected as reverted:
http://en.wikipedia.org/w/index.php?&diff=25866415
detected as reverting:
http://en.wikipedia.org/w/index.php?diff=25866579
detected-as-reverting edit removes only "insomnia" from the detected-as-reverted
edit, i.e. no full revert, as some insertions from previous edits have already been
deleted by the reverted editor himself.
Would be a correct full revert if you collapsed edits by the reverted editor to one.
Example B
detected as reverted:
http://en.wikipedia.org/w/index.php?diff=196507540
detected as reverting:
http://en.wikipedia.org/w/index.php?diff=196507775
self-revert of vandalism introduced ("kirsty u tit") before second editor
reverts--> cannot be reverted by detected-as-reverted edit. Would also be remedied by
collapsing the first editors edits.
Example C
not related to self-revert, this is an example of incomplete vandalism repair, which is
then subsequently completed:
detected as reverted:
http://en.wikipedia.org/w/index.php?diff=162097520
detected as
reverting:http://en.wikipedia.org/w/index.php?diff=162113945
Example D
not related to self-revert
A revert is carried out by TheJazzDalek targeting the edits by
74.131.204.39<http://en.wikipedia.org/wiki/Special:Contributions/74.131.…39>, but
in the same edit, something is deleted by TheJazzDalek, leading to a new unique revisions
content. As
74.131.204.39<http://en.wikipedia.org/wiki/Special:Contributions/74.131.… in
the next edit reverts this deletion by TheJazzDalek, but not the initial revert of his
(
74.131.204.39<http://en.wikipedia.org/wiki/Special:Contributions/74.131.…)
own edits, it is erroneously concluded that
74.131.204.39<http://en.wikipedia.org/wiki/Special:Contributions/74.131.…
reverts himself, which is not the case.
detected as reverted:
http://en.wikipedia.org/w/index.php?diff=292533562
detected as reverting:
http://en.wikipedia.org/w/index.php?diff=292760323
Example E
detected as reverted:
http://en.wikipedia.org/w/index.php?diff=231824943
detected as reverting:
http://en.wikipedia.org/w/index.php?diff=231960286
First, reverting editor (Laser brain) undoes (not rolling back to/ not creating duplicate
revision) some edits by another editor before deleting the result of
67.162.68.255<http://en.wikipedia.org/wiki/Special:Contributions/67.162.…
's edits (one of which was detected here as reverted). The "detected as
reverted" revision is partly self-reverted by
67.162.68.<http://en.wikipedia.org/wiki/Special:Contributions/67.162.68.…
. The other part, a date change in an "accessdate=" is not "undone" as
such, but the whole "accessdate=" part is deleted (stemming from a third
editor).
Example F
Here, between the "reverted" and the "reverting" one, there happens a
mixture of self-reverts, reverts and different vandalism forms,:
detected as
reverted:http://en.wikipedia.org/w/index.php?diff=131372047
detected as
reverting:http://en.wikipedia.org/w/index.php?diff=131658207
If I now failed to answer any of your questions please excuse me and ask me again.
Best,
Fabian
--
Karlsruhe Institute of Technology (KIT)
Institute of Applied Informatics and Formal Description Methods
Dipl.-Medwiss. Fabian Flöck
Research Associate
Building 11.40, Room 222
KIT-Campus South
D-76128 Karlsruhe
Phone: +49 721 608 4 6584
Skype: f.floeck_work
E-Mail: fabian.floeck@kit.edu<mailto:fabian.floeck@kit.edu>
WWW:
http://www.aifb.kit.edu/web/Fabian_Flöck
KIT – University of the State of Baden-Wuerttemberg and
National Research Center of the Helmholtz Association