For those of you who haven't seen it, take a look at Domas' Mituzas wiki-stats:
http://dammit.lt/wikistats/
This is real, accurate hourly snapshot data on the access to Wikipedia
captured from the Wikimedia Squid servers. Project counts show the
total access in a time period to the different language editions.
This is great stuff for visualization, behavioral pattern analysis,
and other purposes. If you do something with it, let us know. :-)
URL may change in the future - we'll put a redirect on the above one
if that happens.
--
Erik Möller
Deputy Director, Wikimedia Foundation
Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate
> I posted it here:
>
> http://www.wiki-translation.com/tiki-
> index.php?page=DanielKinzlerThesis&bl=n
>
> You can see the English translation here:
>
> http://translate.google.com/translate?u=http%3A%2F%2Fwww.wiki-
> translation.com%2Ftiki-
>
index.php%3Fpage%3DDanielKinzlerThesis%26bl%3Dn&hl=en&ie=UTF8&sl=de&tl=
> en
Hum... If I go to the above translation link, only the first bit is
actually translated. I guess Google gives up after a while and leaves
the rest in German.
If you can split it into separate HTML pages, it would make it easier
for people to read it with Google translate.
Alain
My diploma thesis about a system to automatically build a multilingual thesaurus
from wikipedia, "WikiWord", is finally done. I handed it in yesterday. My
research will hopefully help to make Wikipedia more accessible for automatic
processing, especially for applications natural languae processing, machine
translation and information retrieval. What this could mean for Wikipedia is:
better search and conceptual navigation, tools for suggesting categories, and more.
Here's the thesis (in German, i'm afraid): <http://brightbyte.de/DA/WikiWord.pdf>
Daniel Kinzler, "Automatischer Aufbau eines multilingualen Thesaurus durch
Extraktion semantischer und lexikalischer Relationen aus der Wikipedia",
Diplomarbeit an der Abteilung für Automatische Sprachverarbeitung, Institut
für Informatik, Universität Leipzig, 2008.
For the curious, http://brightbyte.de/DA/ also contains source code and data.
See <http://brightbyte.de/page/WikiWord> for more information.
Some more data is for now avialable at
<http://aspra27.informatik.uni-leipzig.de/~dkinzler/rdfdumps/>. This includes
full SKOS dumps for en, de, fr, nl, and no covering about six million concepts.
The thesis ended up being rather large... 220 pages thesis and 30k lines of
code. I'm plannign to write a research paper in english soon, which will give an
overview over WikiWord and what it can be used for.
The thesis is licensed under the GFDL, WikiWord is GPL software. All data taken
or derived from wikipedia is GFDL.
Enjoy,
Daniel
This looks very interesting!
Is this a thesaurus that can be used for translation of words across
languages?
Is there some way to quickly have a demo or view the data?
I browsed some files, and I see entries of the kind:
:xf5bfa ww:displayLabel "de:Feliner_Diabetes_mellitus" .
:xf5bfa ww:type wwct:OTHER .
:xf5bfa rdf:type skos:Concept .
:xf5bfa skos:inScheme <
http://brightbyte.de/vocab/wikiword/dataset/*/animals:thesaurus
which tells me that Diabetes Mellitus of a feline is a concept... I was
interested in the animal thesaurus as a way to translate animal names across
languages... there are a lot of files, and I don't know if I am looking at
the right ones. Perhaps if you pointed us to the most interesting /
understandable datasets, it would be very useful.
I am sorry if the above remarks seem superficial; I cannot read German well
enough to read dissertations in it...
Best, Luca.
On Fri, May 30, 2008 at 2:54 AM, Daniel Kinzler <daniel(a)brightbyte.de>
wrote:
> My diploma thesis about a system to automatically build a multilingual
> thesaurus
> from wikipedia, "WikiWord", is finally done. I handed it in yesterday. My
> research will hopefully help to make Wikipedia more accessible for
> automatic
> processing, especially for applications natural languae processing, machine
> translation and information retrieval. What this could mean for Wikipedia
> is:
> better search and conceptual navigation, tools for suggesting categories,
> and more.
>
> Here's the thesis (in German, i'm afraid): <
> http://brightbyte.de/DA/WikiWord.pdf>
>
> Daniel Kinzler, "Automatischer Aufbau eines multilingualen Thesaurus durch
> Extraktion semantischer und lexikalischer Relationen aus der Wikipedia",
> Diplomarbeit an der Abteilung für Automatische Sprachverarbeitung,
> Institut
> für Informatik, Universität Leipzig, 2008.
>
> For the curious, http://brightbyte.de/DA/ also contains source code and
> data.
> See <http://brightbyte.de/page/WikiWord> for more information.
>
> Some more data is for now avialable at
> <http://aspra27.informatik.uni-leipzig.de/~dkinzler/rdfdumps/<http://aspra27.informatik.uni-leipzig.de/%7Edkinzler/rdfdumps/>>.
> This includes
> full SKOS dumps for en, de, fr, nl, and no covering about six million
> concepts.
>
> The thesis ended up being rather large... 220 pages thesis and 30k lines of
> code. I'm plannign to write a research paper in english soon, which will
> give an
> overview over WikiWord and what it can be used for.
>
> The thesis is licensed under the GFDL, WikiWord is GPL software. All data
> taken
> or derived from wikipedia is GFDL.
>
>
> Enjoy,
> Daniel
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
Dear All,
we have three new techreps available:
- Robust Content-Driven
Reputation<http://www.soe.ucsc.edu/%7Eluca/papers/08/ucsc-soe-08-09.html>shows
that the content-driven reputation we proposed in a WWW 2007 paper can
be made robust to Sybil ("sock-puppet") and other coordinated attacks. In
WWW 2007, we proposed "content-driven reputation" for Wikipedia authors,
where authors gain reputation if their contributions are preserved, and lose
reputation if their contributions are quickly undone. The original
algorithms were very prone to attacks; we show here that they can be made
resistant.
- Assigning Trust to Wikipedia
Content<http://www.soe.ucsc.edu/%7Eluca/papers/08/ucsc-soe-08-07.html>proposes
computing the trust of Wikipedia text on the basis of the
reputation of the author, and the reputation of the people who revised the
text. We display text trust by coloring text background. Many of you have
seen the on-line demo for the English Wikipedia, at
http://trust.cse.ucsc.edu/ . This is an improved version of a November
2007 techrep on the same topic. In this improved techrep, we show how the
trust system can be made resistant to attacks.
- Measuring Author Contributions to the
Wikipedia<http://www.soe.ucsc.edu/%7Eluca/papers/08/ucsc-soe-08-08.html>defines
and compares various ways for measuring the contribution of
individual authors to the Wikipedia. We have our own favorite; read more to
find out :-)
In these months, we have been busy working at
WikiTrust<http://trust.cse.ucsc.edu/>,
an open-source tool for assigning reputation to wiki authors and trust to
wiki content. We already have a batch (or "off-line") system, which can
compute reputation and trust based on wiki dumps, such as the Wikipedia
dumps made available by the Wikimedia Foundation. We are developing an
"on-line" system, which can assign reputation and trust in real-time, as
edits are made. One of our chief concerns in developing an on-line system
was to ensure that it was robust to attack, and we believe we have made
progress in this direction, as reported in the above techreps. We are now
proceeding with the implementation; my guess is that we will have a
prototype in a month or so.
By the way, the "batch" part of WikiTrust <http://trust.cse.ucsc.edu/> can
be easily adapted to carry out various analysis tasks. Basically, it walks
over all revisions of every page of a wiki, and it contains an efficient
text analysis engine that tells you precisely how text was changed between
versions. So, it is easy to use WikiTrust as a platform to write analysis
algorithms for wikis: you don't have to worry about the boring tasks of
reading and parsing markup language, and computing text diffs in a
reasonable way; you can concentrate on the details of the specific analysis
you want to do. It is all open source, and we welcome developers or people
interested in it.
All the best,
Luca (with Ian, Bo, and the other wikitrusters).
Probably I should not be list owner... I will tend to be slow and might
accidentally miss osmething like this...
-------- Original Message --------
Subject: FW: [Call for participation] Wikipedia and AI: An Evolving Synergy
Date: Wed, 21 May 2008 18:39:04 -0700
From: Evgeniy Gabrilovich <gabr(a)yahoo-inc.com>
To: <wiki-research-l-owner(a)lists.wikimedia.org>
Hi,
Would you be so kind to check why I'm not allowed to post a message to
wiki-research-l ?
I'm trying to post a CFP for our workshop, and in the past I was able to
post similar
messages, but this time my request was declined for some reason.
Thank you in advance,
Evgeniy.
--
Evgeniy Gabrilovich, Ph.D.
Senior Research Scientist
Yahoo! Research
2821 Mission College Blvd, Santa Clara, CA 95054
Email: gabr(a)yahoo-inc.com
Phone: (office) 408-349-8155 (cell) 408-218-7284
> -----Original Message-----
> From: wiki-research-l-bounces(a)lists.wikimedia.org
> [mailto:wiki-research-l-bounces@lists.wikimedia.org] On
> Behalf Of wiki-research-l-owner(a)lists.wikimedia.org
> Sent: Wednesday, May 21, 2008 18:36
> To: Evgeniy Gabrilovich
> Subject: [Call for participation] Wikipedia and AI: An
> Evolving Synergy
>
> You are not allowed to post to this mailing list, and your message has
> been automatically rejected. If you think that your messages are
> being rejected in error, contact the mailing list owner at
> wiki-research-l-owner(a)lists.wikimedia.org.
>
>
Apologies for cross-postings.
== Call for Contributions ==
2nd Workshop on Scientific Communities of Practice (SCooP) on June 27th
2008 at Jacobs University Bremen.
http://jem-thematic.net/seminar/scoop2008
== Overview ==
Communities of Practice (CoPs) group people from all around the globe
around a common concern, a common set of problems, which is tackled by
exchanging knowledge, ideas, and expertise.
CoPs also exist in science, although scientific communities of practice
are more heterogeneous than their corporate counterparts, as members
come from a variety of backgrounds and disciplines. Yet, it is exactly
for this interdisciplinarity that these groupings are valuable for a for
deepening knowledge and learning.
In this context, SCooP aims at joining people from different fields,
such as mathematics, computer science, chemistry, physics, biology etc.,
who share a common interest -- Communities of Practice. The workshop
thus wants to facilitate the exchange of experiences and
implementations, and will in particular address questions such as:
* What are scientific or educational practices in educational and
scientific communities?
* Can these practice be automatically detected/ collected/ or modeled?
* What are implementations for CoPs?
* Which features make these tools so attractive and how do they support
(practices of) CoPs?
The workshop welcomes contributions in the following formats.
* Paper contributions (including position papers and research
proposals):
Max. 300 word abstract; paper submission
* Demonstration and Presentations of systems, prototypes, and mock ups:
200-300 word abstract (presentation during the workshop, paper is
optional)
== Important dates ==
* _NEW_ submission deadline for abstracts: May 23rd (via email to
c.mueller(a)jacobs-university.de )
* Submission of papers: May 30th
* Notification of acceptance: June 6th
* Camera ready copies due: June 20th (approximately)
* Workshop in Bremen: June 27th
== Registration and Accommodation ==
* Registration via http://jem-thematic.net/seminar/scoop2008
* Accommodation at http://jem-thematic.net/seminar/scoop2008
== Further Links ==
* SCooP Mailing List:
http://lists.jacobs-university.de/mailman/listinfo/project-scoop
* SCooP Interest Group at http://jem-thematic.net/sig/scoop
The workshop is funded by the Joining Educational Mathematics Network
http://jem-thematic.net/
Erik & I had a good meeting last week with the MacArthur Foundation, who
reiterated their interest in funding research related to Wikipedia and
the other Wikimedia projects. Their primary interest is in developing a
better understanding of the Wikipedia audience (readers), but I believe
they are potentially interested in research into the contributor
community as well.
Our research goals & interests are laid out here
http://meta.wikimedia.org/wiki/Wikimedia_Foundation_Research_Goals
It's a pretty full list, but not an exhaustive one. We'd encourage
anyone who wants to conduct research into the Wikimedia projects to
approach MacArthur for funding, and/or talk to us.
Thanks,
Sue
-------- Original Message --------
Subject: [Wiki-research-l] Fwd: RfC: Wikimedia Foundation Research Goals
Date: Mon, 7 Apr 2008 16:03:41 -0700
From: Erik Moeller <erik(a)wikimedia.org>
Reply-To: Research into Wikimedia content and communities
<wiki-research-l(a)lists.wikimedia.org>
To: Research into Wikimedia content and communities
<wiki-research-l(a)lists.wikimedia.org>
References:
<b80736c80804071602t14d55745v4c34be87e347cb25(a)mail.gmail.com>
FYI
---------- Forwarded message ----------
From: Erik Moeller <erik(a)wikimedia.org>
Date: Apr 7, 2008 4:02 PM
Subject: RfC: Wikimedia Foundation Research Goals
To: Wikimedia Foundation Mailing List <foundation-l(a)lists.wikimedia.org>
Sue & I have drafted a set of research goals that the Wikimedia
Foundation supports. The purpose of the document is to have something
we can point researchers, universities, foundations, and other third
parties to when they ask us: So, what kind of research are you
interested in? Will you support/endorse my research proposal X? In
most cases, we will not actively pursue these goals directly -- we'll
just try to facilitate & endorse research by third parties.
These research goals need to line up with our overall organizational
goals to make sense, so we've tried to map research goals to
organizational goals.
In light of this constraint, please do feel free to make revisions, or
to suggest changes on the discussion page:
http://meta.wikimedia.org/wiki/Wikimedia_Foundation_Research_Goals
It's still a draft looked at by only two people - so we do expect it
to be incomplete. :-)
(BTW - I'm aware that some chapters are pursuing a research agenda on
their own: This is great, and these Foundation goals are in no way
meant to be prescriptive for chapters.)
--
Erik Möller
Deputy Director, Wikimedia Foundation
Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate
--
Erik Möller
Deputy Director, Wikimedia Foundation
Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Sue Gardner
Executive Director
Wikimedia Foundation
Your donations keep Wikipedia running! Support the Wikimedia Foundation
today: http://wikimediafoundation.org/wiki/Donate
Hello All!
Inspired by other cities, Copenhagen will also have its own version of
WikiWednesday. Wiki Wednesday is a monthly event to get together wikiers,
wikipedians, wiki-researchers, developpers, bloggers and anyone interested
in wikis, social software and web2.0.
When: 14th of May 2008, at 17:30
Where: Studenterhuset, Købmagergade 52, København K
Distribute the invitation widely! Come and cheer '1st Life'!
MVH, for the KWW organizing committee,
Rut Jesus (PhD student on wikis, center for phil of nature and science
studies, KU)