Hi all,
I am writing up an academic paper on Wikipedia and need to include some
statistics in the background section about the encyclopedia. What I am
looking for includes, *but is not limited to*:
1. The number of articles in English and the following hugest 3 or 4
language versions,
2. The number of unique users, say in June 2009
3. The number of editors in English and perhaps some other versions.
I have made some search and found that the "Wikipedia statistics" page (
http://stats.wikimedia.org/EN/), for example, provides some of this
information. But I thought some of the people in the list will probably have
related good advice as well.
Thank you in advance!
Regards,
--muhammad abdul-mageed,
Ph.D. student,
Computational Linguistics Program,
School of Library and Information Science,
Indiana University, Bloomington
Hi All!
Here's a newbie to this forum.
I am looking for some references to help me use Wikipedia XML dump.
Here's what I have to do with the XML dump:
I will set up a server on which people can browse Wikipedia articles and
also a processed version of the corresponding Wikipedia article. By
processed version means a wikipedia article with some additional information
with each line. eg
A line in a Wikipedia article (http://en.wikipedia.org/wiki/Chicago) goes
as:
Chicago (pronounced /ʃɨˈkɑːɡoʊ/ or /ʃɨˈkɔːɡoʊ/) is the largest city in the
U.S. state of Illinois, and with over 2.8 million people is the third
largest city in the country.
My processed version of wikipedia page would be like this:
Chicago (pronounced /ʃɨˈkɑːɡoʊ/ or /ʃɨˈkɔːɡoʊ/) is the largest city in the
U.S. state of Illinois, and with over 2.8 million people is the third
largest city in the country. <Some additional information about this line>
Dont bother about "Some additional information about this line". This is
some NLP (natural Language Processing) stuff which processes the line and
generates some additional information about the line.
So, if somebody wants to access the processed version of any Wikipedia
article, he can go to: http://myserver/wiki/processed_Chicago
I hope I am clear what I intend to do with the wikipedia XML dump.
For this I need to know the following things:
1. How should I extract articles from the XML dump, process them by
extracting plain text from them and then insert the processed page back line
by line at the same place in the XML article as before along with the
additional information that will be generated by the NLP stuff.
In this whole process, I want to maintain the look of the wikipedia page as
the original version.
2. How to render a wikipedia page from the XML dump just like as we see in
the online version of the Wikipedia.
3. XML dump does not have images in it, so how will I render images when a
page on my server is accessed.
Any references or ideas in this regard will be greatly appreciated.
Thanks,
Akhil
--
View this message in context: http://www.nabble.com/Using-english-Wikipedia-XML-dump-tp24236727p24236727.…
Sent from the English Wikipedia mailing list archive at Nabble.com.
Very appropriate to this discussion.
MR
----------
From: Eddie Tejeda <eddie(a)visudo.com>
Reply-To: Wikimedia Foundation Mailing List
<foundation-l(a)lists.wikimedia.org>
Date: Sat, 27 Jun 2009 14:57:44 -0700
To: Wikimedia Foundation Mailing List <foundation-l(a)lists.wikimedia.org>
Subject: [Foundation-l] "antisocial production"
'Forget altruism. Misanthropy and egotism are the fuel of online social
production. That's the conclusion suggested by a new study of the character
traits of the contributors to Wikipedia. A team of Israeli research
psychologists gave personality tests to 69 Wikipedians and 70
non-Wikipedians. They discovered that, as New Scientist puts
it<http://www.newscientist.com/article/dn16349-psychologist-finds-wikipedian
s-grumpy-and-closedminded.html>,
Wikipedians are generally "grumpy," "disagreeable," and "closed to new
ideas."'
http://www.roughtype.com/archives/2009/06/the_sour_wikipe.php
I wonder how the mailing list will react....
_______________________________________________
foundation-l mailing list
foundation-l(a)lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Risker wrote:
> It's on the arbcom-L private mailing list, I suspect, Steve. A link won't
>> be
>> possible, sorry.
>>
>
Yes I knew that. I was simply making an obverse point about the mis-usage of
> "private" lists for sweeping public project announcements.
>
In any case, I try to avoid closed-source technology wherever I can.
-Stevertigo
Bit of a cryptic heading? I thought so too when I saw it, but it's
actually quite an interesting use of AWB:
http://en.wikipedia.org/wiki/Wikipedia:FRONDS
Find/Replace ON Demand Services, in case anyone hadn't worked it out.
Carcharoth
In a message dated 6/28/2009 8:35:38 AM Pacific Daylight Time,
fredbaud(a)fairpoint.net writes:
> Please finish the job, if you can. Clearly, business, organizations, and
> towns can also suffer both embarrassment and damages from libel and
> unfounded negative information.>>
Hold on. You said "enterprises" *failed* to reach consensus.
A business is not an organization but an enterprise.
A town, as a corporation is more analagous to an enterprise as well.
Will
**************
Make your summer sizzle with fast and easy recipes for the
grill. (http://food.aol.com/grilling?ncid=emlcntusfood00000005)
http://www.stuff.co.nz/technology/2516472/Wikipedia-entries-slag-off-Palmer…
Maybe I'm getting old and jaded, but when I read that the local
council altered the Wikipedia article about their city to be more
favourable, my reaction was "oh, good, that was the right thing to
do". Heh.
I also particularly like the final line of the article.
Steve
In a message dated 6/27/2009 6:37:43 PM Pacific Daylight Time,
WJhonson(a)aol.com writes:
>
> How dare you! Go away and be quiet!
>
> On a lighter note, I've never met anyone I couldn't piss off.
>
> Will "grumpy" Johnson>>
> ------------------
Wait a moment.
I think maybe I've confused "grumpy" with "aggressively obnoxious".
In other words, not only am I obnoxious, but I try to recruit others to my
cause by aggressively proselytizing, that is, I annoy them to the point
where they also become equally obnoxious. The ultimate plan of course is for
everybody to be hostile all the time. That would be the pure world of hatred
and animosity (and redundancy) that my dark overlord requires for his
return.
On a second note, I wonder how they selected their sample. Complacent
Wikians would be less likely I would think to respond than newly-created
activists.
Will "grumpy and dopey all in one" Johnson
**************
Make your summer sizzle with fast and easy recipes for the
grill. (http://food.aol.com/grilling?ncid=emlcntusfood00000005)
In a message dated 6/27/2009 3:20:47 PM Pacific Daylight Time,
michaeldavid86(a)comcast.net writes:
> Wikipedians are generally "grumpy," "disagreeable," and "closed to new
> ideas."'>>
---------------------
How dare you! Go away and be quiet!
On a lighter note, I've never met anyone I couldn't piss off.
Will "grumpy" Johnson
**************
Make your summer sizzle with fast and easy recipes for the
grill. (http://food.aol.com/grilling?ncid=emlcntusfood00000005)