[Wikipedia-l] An idea. What do you think about this?

Tomasz Wegrzanowski taw at users.sf.net
Tue Feb 10 19:05:51 UTC 2004


On Tue, Feb 10, 2004 at 07:29:14PM +0100, talthen at wp.pl wrote:
> Hello,
> Wikipedia's database is quite huge. But is not widening so fast. But it
> would be changed when all the Wikipedians started creating common database.
> The main problem is the difference of languages, but... I have an idea! :) I
> know my idea will not be so easy to realize, but I would be very usefull.
> 
> The idea is to create new language, based on most popular languages from all
> over the world. This language would not be a human language, but a language
> to store information.
> 
> Today we have some language translating applications, but they are not
> perfect, because two things:
> 1. Some languages differ too much
> 2. Some words have many meanings, and theprogram doesn't know which one
> shoulb be chosen.
> By creating new language we would solve first problem. (I think we do not
> have to create entirely new language, maybe modifying Esperanto would be
> just enough). The second problem could be chosen by listing all the meanings
> of words. For example for english language we could create file like this:
> word number    word    meaning
> --------------------------------
> 1                       mind    intellect
> 2                       mind    thoughts
> 3                       mind    a head
> 4                       mind    to object to
> 
> The translating would look like this:
> I have written a sentence: "The study of logic trains the mind". Application
> scans my sentence and asks in which meaning I used word "mind". Then I
> choose from all "mind" meanings word "intellect". After explaining allthe
> meanings by the writer the application saves it in it's own language in a
> structure like this:
> 116117 6322 987672 1 312312
> Where the numbers means word numbers.
> 
> Decompression would look like this:
> I have asked the program to display the message in Polish. The application
> loads file "polish.txt" and is looking for words with these numbers.
> As a fourth word it loads word from line one (because word "mind" with
> meaning "intellect" is in line 1 in all the languages, not only in English).
> It finds all the words and displays them.
> 
> I know that writing down all the meanings of words is not easy. But if all
> Wikipedians write just a few we would finish it very fast.
> The hardest thing is to make the language, that describes in which time is
> the sencence, what the order of words should be after translating to
> language X and what after diplaying in Y, etc.
> But I think this is possible and would make for eg. building the database of
> Wikipedia much easier.
> And not only this. There will be many applications for it.
> 
> Hope you understood what I mean. I know I may have made some mistakes (both
> gramatically and logically)...
> 
> So- how do you like my idea? Do you think it's worth realizing?

First, choose some small area of knowledge. It doesn't matter what would it be,
but it must be non-trivial for the experiment to be any meaningful.
Then, try to implement something that works with this area and just a few languages.

Natural Language Processing is one of the most difficult parts of the Computer
Science, where lot of really promising ideas failed in practice.
Obviously, we'd love to use anything that'd make our work easier,
but it would be very hard to get something like the thing you describe working.



More information about the Wikipedia-l mailing list