Hi all,
I think countries of teh world would be a fine place
to start. There are
already good tables in the major languages that can be translated easily.
I've not seen this resource mentioned yet in this thread, so I thought
I'd throw out a link:
http://unicode.org/cldr/data/common/main/
As you may know, CLDR (Common Locale Data Repository) is a project
that's hosted on
unicode.org. It has xml files with lots of
localization data for various languages. (I believe there is also a
process for contributing information for languages that haven't yet
been added -- if it's possible to share this info with that project,
it would also be a good way to promote those languages on the web. I
don't have anything to do with CLDR myself, by the way, I just think
it's a cool project.)
Most of this data has been vetted, and data that hasn't been is tagged
with draft="true"
Some specific files to look at (you can load the files directly in
Firefox, I'm not sure about other browsers):
http://unicode.org/cldr/data/common/main/sw.xml Kiswahili (Swahili) -
79 territories
http://unicode.org/cldr/data/common/main/am.xml አማርኛ (Amharic) - 255
http://unicode.org/cldr/data/common/main/ka.xml ქართული (Georgian) - 191
http://unicode.org/cldr/data/common/main/ms.xml Bahasa Melayu (Malay) - 239
For some languages there isn't much there currently:
http://unicode.org/cldr/data/common/main/ur.xml Urdu - 1 territory
http://unicode.org/cldr/data/common/main/az.xml Azeri - 1 territory
http://unicode.org/cldr/data/common/main/ml.xml Malayalam - 1 territory
And for others there's not even a file -- unfortunately, Udmurt,
Ossetian, and Chuvash are in this group -- but this might change in
the future. Perhaps it would be easier, also, for these speakers to
translate from the Russian file?
These files also contain these other categories (again, with varying
degrees of completeness):
* language names
* currencies
* "exemplar characters" -- potentially the basis for an entry on the
writing systems of various languages
* calendar information -- months, days, stuff like that
I've already started writing some Python scripts for extracting info
from the CLDR, I'd be happy to try to generating stuff with them in
whatever formats people need, etc.
Well, there's my several cents =)
Best regards,
Patrick Hall