[WikiEN-l] A simple vandalism analysis tool

SCZenz sczenz at gmail.com
Sun Dec 25 07:35:59 UTC 2005


It seems to find no vandalism earlier than February 2004 in any of the
very high visibility articles I checked.  Was there a change in
Wikipedia custom and/or rollback functionality at that time, or is
something going wrong with your counter?

In any case, good work.

SCZenz

On 12/24/05, Tony Sidaway <f.crdfa at gmail.com> wrote:
> My vandalism analysis tool, which uses a simple but powerful
> methodology developed by Brian0918, analyses edit summaries on
> articles to spot probable vandalism reverts by recognising the summary
> patterns of standard rollbacks, and edits labelled "rvv", "rv v" or
> "rvc".  It was developed for English Wikipedia but probably has
> applications beyond that, and the methods developed here have obvious
> utility beyond the recognition and reporting of vandalism.
>
> You can visit it here:
>
> http://tools.wikimedia.de/~tony_sidaway/
>
> Please try to break it, and tell me what happened.  There is a link to
> a discussion page for that purpose.
>
> The rationale is that, while vandalism is difficult to recognise
> electronically, a pretty easy and reasonably reliable way to track
> vandalism on a popular wiki article is to examine  edit summaries and
> count the proportion of them that indicate that the editors apparently
> believed themselves to be reverting vandalism.
>
> A highly experimental adaptation of this script to recognise (only)
> rollbacks on the German Wikipedia is here:
>
>
> http://tools.wikimedia.de/~tony_sidaway/cgi-bin/vandalismus.cgi
>
>
> The text of the latter CGI script is currently in English, although it
> is analyzing German text.  As I know nothing about
> internationalization I have no idea whether it will always perform
> correctly if UTF-8 multibyte characters (such as o-umlaut) are
> entered.
>
> This simple test seems to suggest that it does work:
>
> http://tools.wikimedia.de/~tony_sidaway/cgi-bin/vandalismus.cgi?article=Köln
>
> Wikipedia is an international project and I welcome any and all
> testing input on this.
>
> Presently I don't know of any edit summary patterns that
> non-administrators on the German Wikipedia use to indicate that
> they're reverting what we on English Wikipedia would recognise as
> simple vandalism--as I'm unfamiliar with their practises I'm not even
> certain that they draw the same distinctions that we do on English
> Wikipedia between intentional and overt disruptive edits (simple
> vandalism) and more subtle vandalism or trolling.
>
> Any help on this that German speakers can offer would be most welcome.
>
> Although I address the German Wikipedia prominently because its
> community is highly advanced and well organized, its content
> comparable to that of the English Wikipedia, and (not least) Deutsche
> Wikipedia hosts the tool server, I would also love to produce useful
> tools for as many languages as possible--the skills I learn can be put
> to use in tools of more general use than the current one.  The scripts
> I write can easily be internationalized.  I cannot write good German
> (whenever I try, native German speakers beg me to stop!) but I can
> write good French and reasonable Spanish.  I am particularly
> interested in Chinese, Indian languages, and Russian.
> _______________________________________________
> WikiEN-l mailing list
> WikiEN-l at Wikipedia.org
> To unsubscribe from this mailing list, visit:
> http://mail.wikipedia.org/mailman/listinfo/wikien-l
>



More information about the WikiEN-l mailing list