On Sat, Oct 1, 2011 at 9:58 PM, Ian Woollard <ian.woollard(a)gmail.com> wrote:
On 1 October 2011 18:15, Carcharoth
<carcharothwp(a)googlemail.com> wrote:
The assumption "Presumably anything that
still remains is of
sufficient quality for whatever level the article is" has so much
wrong with it that I don't know where to start.
No, if material lasts for a long period in an article it's highly likely to
be fairly good even if it gets rewritten later; and the more material and
the longer it lasts, the better.
Material lasts a long time for two reasons:
(a) It is good and lots of people have checked it and left it alone;
(b) It is bad/wrong and no-one has spotted it yet and replaced it or
rewritten it.
I don't see how you can devise a metric to distinguish these two case,
as you would have to detect the number of people silently checking and
approving something (not just reading it). Lots of quality control is
*silent* and not detectable in the current metrics. It would be
different if there were a way for people to mark text and say "I have
this book and have checked this citation, or followed the URL and
agree with what is written here". Essentially a way to detect the
silent verification that often takes place.
It's the area under the curve that matters, not
whether it *eventually* gets
rewritten.
So time_in_article * number_of_unique_characters is probably a fairly good
metric.
Not in the case of obscure articles written by one person, not linked
much from anywhere (but not triggering orphan article bots), and only
small changes made over the years. View stats might help here, but
probably not much as there are a vast, vast number of articles not
visited very much at all. Those would account for most of the
"unchanged text" you would be picking up.
And you could multiply by the article hit rate to get
an even better metric
I expect.
Whereas you can get very high edit counts by many well-known ways, even
breaking an edit down into many sub-edits can multiply up edit counts, or
just doing lots of vandalism reverts.
Yes, I never said edit count was reliable for anything or useful in
any way. I'm only saying that unique text is likely not very helpful
either. But the best way to find out is to actually try this and see
if it shows anything useful. If it does, great. If not, then try
again.
<snip>
Carcharoth