On Tue, May 20, 2003 at 01:46:48PM -0700, Brion Vibber wrote:
On Tue, 20 May 2003, Lee Daniel Crocker wrote:
Collation rules for all languages are defined in
the Unicode spec;
Well, that could be handy. :) I'll see if I can dig them up.
hmm... This looks like a place to start:
http://www.unicode.org/unicode/reports/tr10/
I believe MySQL contains many of them, but
I'm not sure how to tell
it how to use them.
MySQL's really ugly in this regard. First, no UTF-8 support at all.* The
collation order modules that it does have (for some 8-bit charsets and
some multibyte) can only be enabled on a server-wide basis, so we can't
say "this database sorts as english, this one sorts as german, this one
sorts as polish" unless we run separate instances of MySQL.
* Allegedly 4.1 has/will have some unicode support. It's not stable
though.
** Yes, I know PostgresQL has Unicode support. :) I don't know if it
supports per-table or per-column selection of collation order, and there
would be much other work to get Wikipedia running on it.
Well, PostgreSQL allows you to set the encoding on a per database basis.
So, you can have some databases with UTF-8, some with EUC_JP, etc. I
don't think you can have some ASCII rows and some unicode rows, although
I could certainly be wrong. Its collation rules are based on whatever
character set the database is.
--
Nick Reinking -- eschewing obfuscation since 1981 -- Minneapolis, MN