Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

19 Mar 2009

On Thu, Mar 19, 2009 at 1:03 PM, Delirium &lt;delirium(a)hackish.org&gt; wrote:

...
  Brian wrote:
  This extension is very important for training 
machine learning
 vandalism detection bots. Recently published systems use only hundreds
 of examples of vandalism in training - not nearly enough to
 distinguish between the variety found in Wikipedia or generalize to
 new, unseen forms of vandalism. A large set of human created rules
 could be run against all previous edits in order to create a massive
 vandalism dataset.  As a machine-learning person, this seems like a somewhat
problematic
 idea--- generating training examples *from a rule set* and then learning
 on them is just a very roundabout way of reconstructing that rule set.
 What you really want is a large dataset of human-labeled examples of
 vandalism / non-vandalism that *can't* currently be distinguished
 reliably by rules, so you can throw a machine-learning algorithm at the
 problem of trying to come up with some.

since theres already a database, this sounds like could be done flagging
edits as "vandalism", and then reading the existing database information to
extract these details, like ip,  a diff of the change, etc..   that way,
humans define what is a "vandalism", and the machine can learn the meaning.

this may need a button or something, so users report this, and the database
flag the edit

-- 
--
ℱin del ℳensaje.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia