Wiki Research Junkies,
I am investigating the comparative quality of articles about Cote d'Ivoire and Uganda
versus other countries. I wanted to answer the question of what makes high-quality
articles? Can anyone point me to any existing research on heuristics of Article Quality?
That is, determining an articles quality by the wikitext properties, without human rating?
I would also consider using data from the Article Feedback Tools, if there were dumps
available for each Article in English, French, and Swahili Wikipedias. This is all the
raw data I can seem to find
http://toolserver.org/~dartar/aft5/dumps/
The heuristic technique that I currently using is training a naive Bayesian filter based
on:
* Per Section.
* Text length in each section
* Infoboxes in each section.
* Filled parameters in each infobox
* Images in each section
* Good Article, Featured Article?
* Then Normalize on Page Views per on population / speakers of native language
Can you also think of any other dimensions or heuristics to programatically rate?
Best,
Maximilian Klein
Wikipedian in Residence, OCLC
+17074787023