UTF-8 conversion was [Wikitech-l] Multilingual interface
by Constans, Camille (C.C.)
--
Camille Constans
Apprenti ingénieur production maintenance
Maître d'apprentissage : Jean Lhomme
>-----Message d'origine-----
>De : wikitech-l-bounces(a)Wikipedia.org
>[mailto:wikitech-l-bounces@Wikipedia.org]De la part de Brion Vibber
>Envoyé : mardi 11 mai 2004 07:48
>À : Wikimedia developers
>Objet : Re: [Wikitech-l] Multilingual interface
>
>
>Nikola Smolenski wrote:
>> Yes. One can use all languages in same encoding (all
>Latin-1, all Latin-2,
>> all UTF-8...) but can not mix encodings. It is trivial to
>convert any language
>> to UTF-8, except for the linktrail which is not used anyway.
>Wikisource,
>> Wikibooks and Wiktionary are in UTF-8 already, so I don't
>think it will be a
>> problem for them.
>
>All the remaining latin-1 language files will have to be
>upgraded to do
>that, or appropriate run-time upconversion added.
I have an idea, we had converted the frwiki by converting all the dump in one time. I took around 2 hours for 2.5GB of dump
We could convert only all tables except old. And running a script which convert old later. Old will be in broken iso-8859 for a few day. I wrote a small php script to convert old from iso-8859-1 to utf-8, cause I broke old on de.wiktionary during the conversion to utf-8 (I forgot that old is gzipped, converting a binary is not very good:) )