I agree with Gregory that it is very useful to quantify the usefulness of
trust information on text -- otherwise, all comparison are very subjective.
In our WikiSym 08 paper, we measure various parameters of the "trust"
coloring we compute, including:
- Recall of deletions. Only 3.4% of text is in the lower half of trust
values, yet this is 66% of the text that is deleted in the very next
revision.
- Precision of deletions. Text is the bottom half of trust values has
probability 33% of being deleted in the next revision, agaist a probability
of 1.9% for general text. The deletion probability raises to 62% for text
in the bottom 20% of trust values.
- We study the correlation between the trust of a word, sampled at random
in all revisions, and the future lifespan of a word (correcting for the
finite horizon effect due to the finite number of revisions in each
article), showing positive correlation.
Some aspects are not captured by the above measures:
- We ensured that every "tampering" (including cut-and-paste) are
reflected in the trust coloring, so it is hard to subvert the algorithm
(does "age" provide this?).
- We ensured the whole scheme is robust wrt attacks (see the various
papers if you are interested).
I fully believe that it should not be hard to improve on our system re. the
above measurements. And I fully agree that the "reputation" we compute is
essentially an internal parameter of the system, and does not really
constitute a good summary of a person's overall Wikipedia contribution; for
this and other reasons we do not display it.
Luca
A simply objective challenge for any predictive coloring system would
be to use them in the following experimental
procedure:
* Take a dump of Wikipedia up a year old, use this as the underlying
knowledge for the systems.
* Make several random selections of articles and include the newer
revisions not included in the initial set up to 6 months old. Call
these the test sets.
* The predictive coloring system should then take each revision in a
test set in time order and predict if it will be reverted (Within X
time?).
* The actual edits up to now should be analyzed to determined which
changes actually were reverted and when.
The final score will be the false positive and false negative rates.
So long as e assume that the existing editing practices are not too
bad we should find that the best predictive coloring system would
generally tend to minimize these rates.
_______________________________________________
Wikipedia-l mailing list
Wikipedia-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikipedia-l