On Fri, Apr 29, 2011 at 7:58 AM, Manish Goregaokar <manishsmail@gmail.com> wrote:

 1. Select 200 random articles.
 2. Get the top contributors for each of them.
 3. Get the edit counts for those contributors.

I think he has the list/s of 200 articles, and does not want random ones.
Plus, he doesn't want the editcounts, he wants their top edited articles, with the editcount per article.

My personal opinion is that this HAS to be done via php (though I can't comment of server load).
Use php-mysql to determine the list of top contributors per given article, then loop for each contributor, and give his top edited articles... Shouldn't be hard, though you might want to clarify what you mean by "top". (Top 3? More than X edits? More than X% edits per day/week/month/beginning of time? More than X% edits of the top editor?).


Thanks again for the info. Yes, this is basically correct. I am looking to collect this info based on 100 articles from the Wikipedia science series. If the data proves relatively easy to collect, I like to collect data on all articles in the science series which is around 200 articles. Top contributors for me are those with 10 or more edits in the sampled article from the science series. For the sake of clarity, here is a short sample of the data I'm looking for.

From the "science" article http://en.wikipedia.org/wiki/Science

Clicking "view history" and then "contributors" gives a ranked list of all contributors in order of most edits.

http://toolserver.org/~daniel/WikiSense/Contributors.php?wikilang=en&wikifam=.wikipedia.org&grouped=on&page=Science

The top three editors (lets call them A, B, and C) currently have 445, 73 and 70 edits respectively. Clicking on contributor "A" to see their user page and then the "user contributions" from the tool box shows all their edits. For example, he/she has several edits to the articles "intelligent design" and "southern poverty law center", etc. and user "B" has edits to "rock formations" and "human evolution". I would like to count frequency of all these edits across the top users for the sampled (e.g. science) articles sorted by the article title.

I don't know what the best way to arrange the data would be, but below is a Google Doc Spreadsheet that sort of shows what I think it would look like.

http://goo.gl/VIWd6

If the Query Service seems the best approach (is this done using the php-mysql referenced above or is it a different process?) then I will go ahead and create a task on https://jira.toolserver.org/browse/DBQ. If this is not the best or correct way to go any guidance is appreciated.

Thanks.

--
Jim