tic(a)tictric.net wrote:
Though not being a developer I might make a suggestion
on this topic.
For I'm helping out with the Open Directiory Project I know they have been
struggling with converting their databases to utf8 quite some time now.
And strange characters keep appearing here and there.
Maybe the devs of mediawiki could get some cool hints from them guys on
what unforeseen problems might arise when converting to utf8.
Cheers
Manfred
We have already converted the french wiki to utf-8. Except some strange
characters which didn't belong to latin-1 (typically windows code page
characters), we didn't have have problems as far as i remember. The problem
with non latin-1 can even be corrected by a bot before de conversion.
As the french wiki is relatively small compared to the english wiki, we just
converted the dump. For bigger wikis i guess converting all tables, except
old, all at once can be a good idea. Then as proposed, setting a « utf-8 »
flag for old articles and converting them one by one, starting by the most
recent ones. The program we used to convert to utf-8 can be adapted to
convert an article instead of a dump quite easily (and it should be faster
in fact).
Cheers,
Med