[Foundation-l] Questions about most viewed articles in 2011

Bjoern Hoehrmann derhoermi at gmx.net
Thu Jan 12 11:30:54 UTC 2012


* Frédéric Schütz wrote in gmane.org.wikimedia.foundation:
>On 11.01.2012 22:04, Bjoern Hoehrmann wrote:
>> So the numbers are rather rough and won't really tell you anything you
>> did not already know (Sex>  Astrobiology, no surprise there), and you
>> can't really say Steve Jobs>  Justin Bieber based on this data without
>> explaining all the caveats anyway. The data is more useful if you look
>> for general trends like in http://katograph.appspot.com/ which tells
>> you things like that articles on people in Film are viewed much more
>> often than people in Sports which is at least slightly non-obvious.
>
>That's another question: what is the easiest way to know which articles 
>belong to a given broad category ?
>
>Categories do not work well: try to take a top category such as 
>Category:Sports and go down the category tree; you'll probably get most 
>of the Wikipedia pages (at least, you'll definitively get many pages 
>that have nothing to do with sports). WikiProjects seem better, but 
>there are many of them (including sub-projects, etc), so it is not easy 
>to automatize.

It depends on whether the system actually is a tree. When I made the app
above, the category system of the german Wikipedia was not a tree, but
it was easy enough to fix, and so only 10% of the articles end up being
in Kategorie:Sport or a subcategory. Kategorie:Wissenschaft (science) is
more of a problem, about 75% of articles can be found under that. To me,
the bigger problem is coming up with a list of these "broad categories",
if you want something more granular than people, places, events, things,
concepts, other, or whatever. If you had that, you could at least run a
couple of experiments (things to consider would be, for instance, how
deeply nested an article under "sports" is, categories of linking and
linked articles, how many articles are in or under some category). As an
example, Angela Merkel is in `German chemists`, but most of the linked
articles are in political categories, so a rule "person + politics =>
politician" if "politician" is a broad category in the system, that'd
be fairly easy to implement.
-- 
Björn Höhrmann · mailto:bjoern at hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



More information about the foundation-l mailing list