[Wikipedia-l] RFC: Principles of mass content adding on small Wikipedias

Gerard Meijssen gerard.meijssen at gmail.com
Sun Jan 29 20:40:15 UTC 2006


Hoi,
As this is a RFC, I will comment to the RFC itself and not on the other 
comments.

Danny mentioned in his response that a bot could do great work. Henna 
did remark that Wikidata could make a difference. Milos mentions that 
data may need localisation.. I want to remind you about an e-mail that 
Sabine Cretella send to the lists. Sabine is really active in the 
Neapolitan Wikipedia. A project much younger than the Swahili wikipedia 
but already with 4336 articles. The secret of this success is among 
other things that Sabine uses professional tools to translate into the 
Neapolitan language. OmegaT, the software Sabine uses, is GPL software 
and is what is called a CAT or Computer Aided Translation tool. This 
allows for an efficient translation and is /not /the same as Automated 
Translation.

When we have Wikidata ready for prime time, we will be able to store 
structured data about one subject. This is not a full solution as many 
of the words used in the presentation need to be translated, maybe even 
localised to make sense in another language. I for instance always have 
to think if 9/11 is the ninth of November or the eleventh of September; 
I do know of it for the event. In order to present data, labels have to 
be translated and data may have to be localised. The WiktionaryZ project 
will help with the labels and standards like the CLDR are what define 
how the localisation is to be done.

We are making steady progress with WiktionaryZ, the first alpha demo 
project is at epov.org/wd-gemet/index.php/Main_Page (a read only project 
for now). There is a proposal for a project at 
http://meta.wikimedia.org/wiki/Wiki_for_standards that intends to help 
us where the standards prove to be not good enough. As Sabine is part of 
the team behind OmegaT, it is being researched how OmegaT can read and 
write directly to a Mediawiki project.

One other aspect that is needed in new project is commitment. People who 
express their support for a new language project should see this as an 
indication of /their /commitment and not as an expression of their 
opinion. When people start to work on a new project it is important that 
like on the Neapolitan wikipedia, there are people who are knowledgeable 
and willing to help the newbies, I hope that the IRC channel 
#wikipedia-bootcamp can serve a role for this as well.

Thanks,
    GerardM


Milos Rancic wrote:
> Maybe this should go on Meta, but I want to see comments here, first.
>
> As I can see, there are two ways of mass content adding. The first one
> includes generation of articles based on some public data (for example
> NASA, National Geospatial Inteligence Agency, French government etc.)
> Now, this is almost usual way for mass content adding and I think that
> a number of us have some experience with such work.
>
> The other way is adding content using English Wikipedia. English
> Wikipedia has a lot of categorized articles, a lot of templates etc.
> All these typical forms can be used for automatic content creation on
> small Wikipedias.
>
> I think that idea of having a thousends of articles with a couple of
> sentences and good categorization about a lot of fields -- can be very
> helpful not only to small Wikipedias, but also for spreading free
> knowledge. I think that it would be a great day for us when people
> which native language is Mongolian will be able to read about places
> in Amazon and movies from Australia in their native language. And,
> this is possible to do much faster then we think.
>
> And not only that: bots should be able to update information; bots
> should be able to do more things through time. Finally, it would be
> possible to start with knowledge transfer between Wikipedias in
> different languages: if we have the same methodology on different
> Wikipedias, we would be able to update data semi-automatic (up to full
> automatic).
>
> However, this needs a number of people who are interested in such project:
>
> (1) We would need people who know to work with bots (pywikipediabot or
> something similar).
> (2) We would need make software based on the bot core which would have
> to be localized: like MediaWiki should be localized; this software
> should have sentences like "<movie> is movie made in <year> in
> <country>. Genre of that movie is <genre>. Director was <director>..."
> in a number of languages.
> (3) We would need good and quality work on English Wikipedia. Rules
> like "this goes to the table, that goes to the template up, this goes
> to template in the middle" should be more or less strict (but, I see
> that people are working in such way on en:).
>
> This is RFC. I am looking for your comments.
>   




More information about the Wikipedia-l mailing list