I've checked in a small experimental feature on REL1_5 and HEAD for
explicitly specifying the utf8 charset on tables and setting the
connection so it doesn't mangle the utf8 data on the wire.
As previously discussed this is insufficient for Wikimedia since it will
fail when you try to insert data with non-BMP Unicode characters in
various places, but those running their own sites and having to use
newer MySQL might appreciate more mundate text being stored correctly.
Activating this mode on an existing site could cause interesting
problems, however; use caution engaging it.
The server communications mode is controlled by $wgDBmysql5 (true to
send 'SET NAMES utf8' on connect), and if selected on installation table
defs are pulled from maintenance/mysql5/tables.sql, adding 'DEFAULT
CHARSET=utf8' on the tables.
I've tested this briefly on 5.0.15, but I think it will work on 4.1 as well.
-- brion vibber (brion @
pobox.com)