For the moment there are automatic classification algorithms that use
and don't use grammars and vocabularies, and with and without self
learning. Some of those systems are very advanced, and some of them has
been around for several years. There has been discussions about pure
statistical system, pure lexical systems and wetter such systems should
be fully automatic or not, and wetter they produce biased results or
not. The short version of the results are fully statistical systems
gives poor results, and that they should include some form of lexical
analysis at some level. Guided systems become very work intensive and
should be avoided if possible.
For examples of what automatic text analysis systems can do
http://www1.cs.columbia.edu/nlp/projects.cgi
For advanced automatic classification engines
http://www.autonomy.com/content/home/index.en.html
http://www.cyberwatcher.com/
If lexical analysis is not handled some way or another someone _will_
include it, and if someone runs a simulation and find the system can be
better by tweaking it somehow there _will_ be questions why it hasn't
been done before. There are other works on vandalism at the moment, and
to neglect those will also be questionable. "Exploring the feasibility
of automatic rating online article quality" and "Creating, destroying
and restoring value in Wikipedia" are only two such papers.
I think it is wise to accept that the proposed system is a solution to a
subproblem, and that an overall system is fairly much more advanced than
this one. I don't say it does not work, I don't say it should not be
tested, I say it is only a solution to a very small subset of the
overall solution. As such it should be built in such a way as not to
block further refinements. It should not be viewed as a final solution
and there should definitely not be made any claims that alternate
systems does not work without backing such claims with hard proofs.
If someone will front this particular system as the ultimate one, good
luck! I don't think it is the ultimate system. Still I do think it can
be a very good tool if used as what it is, a solution to one of several
subproblems.
John E Blad
Daniel Arnold skrev:
Am Samstag, 22. Dezember 2007 19:49:10 schrieb John
Erling Blad:
Wetter you do or don't do trust metrics
according to misspellings is a
choice, but to not correct for misspellings will give a suboptimal
solution. It is important to note that you do this, and how it changes
the system. I have no doubt that trust metrics will incorporate this as
an option in the future, no matter wetter it is part of an official
system or not. Likewise I believe it will incorporate systems for
weighting cooperation between users and articles overall quality. There
is no easy single solution to this, the solution is a complex connected
multivariate system.
An automated system (regardless which one) should never care about spelling:
a) Many citations are in outdated or non-standard ortography. This is
especially true for German, which has changed its ortography just some years
ago again. A system that gives an incentive to tamper citations is bad.
b) There are assistive systems integrated into the browser (at least Konqueror
has this for many years and Firefox now also as spell checking). Furthermore
there exists a Toolserver + Javascript based solution that highlights
probably missspelled words on reading an article (curently only German but
this could be adapted for other languages):
http://de.wikipedia.org/wiki/Wikipedia:Helferlein/Rechtschreibprüfung (see
http://de.wikipedia.org/wiki/Bild:Rp_js_beispiel.png and
http://de.wikipedia.org/wiki/MediaWiki:Gadget-Rechtschreibpruefung.js). A
more advanced external tool is
http://rupp.de/cgi-bin/WP-autoreview.pl. These
tools are optionally integrated into the Wikipedia interface via the gadgets
extension.
So if you make it obvious to editors that there is something they should check
they very likely change it and if someone wrote a text with bad ortography
someone else gets reputation because of his spell checking and as he did a
review it is absolutely right that this text gets more trust afterwards (I
know you will come with examples of rubbish text that got corrected to right
spelling, but there are nonsense texts with and without bad spelling).
Arnomane
------------------------------------------------------------------------
_______________________________________________
Wikiquality-l mailing list
Wikiquality-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l