[Foundation-l] statistics

Robert Scott Horning robert_horning at netzero.net
Wed Apr 5 14:55:46 UTC 2006


kent emerson wrote:

>I am preparing a Masters thesis on wikipedia and am
>questioning whether
>  
>
>>wikipedia users are representative of the regular
>>    
>>
>population. I would
>  
>
>>like to know what demographic uses wikipedia? Have
>>    
>>
>you collected any
>  
>
>>statistics on the representative makeup of those who
>>    
>>
>work on your site
>  
>
>>including gender, age, geographic location, income,
>>    
>>
>education, ownership
>  
>
>>of home, marital status? If so, would you feel
>>    
>>
>comfortable sharing it
>  
>
>>with me for the purposes of my research.
>> 
>>I would appreciate if you can help.
>> 
>>Kent Emerson.
>>    
>>
>
>  
>
There is a huge amount of raw information that can be gleaned from the 
user pages to help prepare such a demographic cross-section, but it is 
not organized into neat tables and is only raw data.  It is also 
self-proclaimed, so it is somewhat suspect as well.  Some of the users 
put up special "templates" (especially on Wikipedia) that proclaim 
different skills, with the most typical announcement being the 
proclaimation of what language skills you have (the Babel templates). 
 This has since been expanded to computer programming languages, 
political leanings including internal politics on Wikimedia projects, 
schools of thought, marital status, hobbies, geographic origin and other 
different interests.  Some of the information can also simply be found 
right on the user page in raw text.

Doing statistical analysis on this very rich set of information on user 
pages might be a very interesting study, but it is going to take quite a 
bit of work to pull all of the information together and will be a very 
tedious process.  Rather than waiting for people to e-mail you with 
responses, this would give you a much larger set of data to work with, 
and can be cross referenced to articles and activity levels to give 
extra dimensions of research variables to look at.  If you are really 
interested in doing something like this (and worthy of a master's 
degree), I would strongly recommend that you obtain a full dump of one 
of the Wikipedia databases and get a skilled database guru to help you 
out in terms of allowing you to "mark up" various users according to 
criteria that you want to use in the study, and compare that to other 
factors including their status as administrators, articles they have 
edited, and activity level.  This isn't going to be handed to you on a 
silver platter, but there is data available if you are willing to do the 
work of organizing it.

One other place to look in terms of seeing what other statistical data 
has been developed for Wikimedia projects is to see the collection of 
statistical analysis pages that were developed by Eric Zachte and can be 
found here:  http://stats.wikimedia.org/

These tables are more oriented toward measuring the growth of Wikimedia 
projects instead of demographic comparisons, but there is multi-lingual 
data available here as well that might be useful for you to review as 
well.  Chronological information is also available in the raw database 
dump, and another factor to consider with this sort of study.

-- 
Robert Scott Horning






More information about the foundation-l mailing list