[Wiktionary-l] Re: [Wikitech-l] Export Dictionary?

Andrew Dunbar hippytrail at gmail.com
Tue May 17 10:29:35 UTC 2005


I was working on parsing the English Wiktionary for some months, with a
long-term goal of sharing translations for my translator, Linguaphile.

The free-form nature of Wiktionary articles is the biggest hurdle. While there
is a format to add some structure to the data, it has many variations, many
experimental new ones appearing all the time, and many inherent flaws which
the contributors either haven't overcome, or haven't been able to agree to a
single way to overcome.

I would've happily shared the code I had, but due to a grey-out (so I'm told),
almost all of my computer's components were destroyed and I haven't been
able to recover the data from the hard drive.

While a fully structured re-design would obviously help myself and people
wanting to interchange Wiktionary data with .dict data, it would definitely
make it much harder for the general user to contribute.

I've been thinking about a compromise solution where slightly more
structure might be added, perhaps similar to HTML/CSS styles. This
coupled with some kind of very flexible parsing might get a degree of
success.

I'm still very interested in this field and may try to recreate my parser but
losing that much work is very disheartening.

Andrew Dunbar (hippietrail)


On 5/17/05, Gerard Meijssen <gerard.meijssen at gmail.com> wrote:
> Brian Suda wrote:
> 
> >I have been on this list awhile, when i originally joined i was
> >interesting in the possibility of exporting the wiktionary data as
> >.dict format. Now that the newest version of OSX 10.4 has a built-in
> >dictionary that uses the dict:// to look-up words i was interested to
> >see if anyone on the technicaly side would like to explore the
> >possibility of either exporting the Wiktionary database as .dict
> >format, or run a dictionary daemon that would access the wiktionary
> >database server and return dict entries. It would be read-only, but it
> >would be another interesting way to access the wiktionary besides the
> >web interface.
> >
> >Does anyone on the tech list know if this is even possible? I'm not
> >asking you to do it (i can write the export), i was wondering if there
> >is some sort of database schema available to extract the data into
> >dict format, or are the entries too fragmented to even attempt an
> >export?
> >
> >-brian
> >
> Hoi,
> I read your mail with intrest. It made me look into what the .dict
> format is. It is described in RFC 2229. It allows for people to look up
> information from their computer and the information is delivered from
> one of the many hosts that may hold the requested information. As Angela
> said in her reply to your post, we are working on a new iteration of
> Wiktionary that is going by the name of "Ultimate Wiktionary". This will
> have a relational database at its heart. It is intended to have content
> in all languages and the first challenge is to make it work in the first
> place. The second challenge is to create a User Interface that
> translates to all these languages and the third challenge is to have an
> import and export mechanism, preferably using a standards based XML scheme.
> 
> We hope to show something at the Wikimania event. In your mail you want
> to export the wiktionary data. The consequence is that when you choose
> for export, you will have to do this continually as we hope to increase
> the content of Ultimate Wiktionary dramatically. As far as I understand
> the RFC, there is a need for responding to a request and providing a
> reply in a set format. There is no need to have a database in a specific
> format as long as the respons provided conforms to the RFC.
> 
> As the Ultimate Wiktionary is being designed at the moment and as
> Wikidata is being built, this is a time to consider what is needed to
> provide .dict functionality. This functionality will be included when
> someone does the programming or when someone finds the funds to strap
> the .dict functionality on top of Ultimate Wiktionary. At this time it
> is premature to think about exporting from UW as UW has not been built
> yet. It will however be possible to do so.
> 
> Some of the current wiktionaries can be parsed into information as they
> are highly structured. Some will prove an interesting challenge to
> convert to any other format. Because of its lack of structure and
> consistency it is as closed as any proprietary format would be. Even the
> names of an article is not necessarily the name of the associated word
> as some Wiktionaries still capitalise the first character of a word.
> 
> One problem I see with exporting content from Wiktionary is the GNU-FDL
> requirement to maintain the history of the contributors. For the UW, I
> think it can be solved by adding the history information on the talk
> page. The necessity for UW stems from the likelyhood that many
> wiktionaries, if not all, may merge into the Ultimate Wiktionary and be
> abandoned.
> 
> Thanks,
>    GerardM.
> 
> _______________________________________________
> Wiktionary-l mailing list
> Wiktionary-l at Wikipedia.org
> http://mail.wikipedia.org/mailman/listinfo/wiktionary-l
> 


-- 
http://linguaphile.sf.net



More information about the Wiktionary-l mailing list