Vincent Ramos wrote:
* improve interwiki links: any non latin-1 (ISO-8859-1
= default charset)
link would be possible without any transformation. Many non latin-1
links are copied and pasted in raw format by interwiki wanderers but
not checked afterwards; they are always miscoded, which results in loss
of time to fix them;
Most French accented characters are interchangeable between Unicode,
ISO-8859-1, and the more recent ISO-8859-15 (which was a modification
that allowed for such things as the Euro symbol). It's the other
renditions that seem to cause the problem.
* improve orthography (and articles naming): French
uses the famous <oe>
digraph that is not encoded by latin-1 (latin-9 does); every editor
must either type the HTML entity œ or prefer not to encode it,
resulting in misspelled words (one of our bots, Orthogaffe, when it
was used for orthography purpose, had many "oeuvre -> œuvre"
replacements to do);
I can use Alt+0156 to create the œ ligature, but I believe that this is
in the unstable area of coding. It is not normally on the keyboard.
Simply using "oe" without a digraph is not a spelling error, but a
breach of typographical convention. Unfortunately most books that I do
use for French language reference use the digraph but do not discuss the
problem at all. The book by Léandre Bergeron, "Dictionnaire de la
langue québécoise", uses the two letter format in its listings, but also
without explanation. Alphabetical lists treat the ligature as if it
were two letters, so it should be treated as optional, and the
initiating author's choice should be respected. I would simply use the
two letter form and would object if it were changed. In article titles
the author's choice should also be respected, but a redirect should be
set up from the alternative.
* terminate transcodage problems: many editors do not
use Windows
and its codepages; other do, but with Win-1252 or Unicode as default
charset. When some text is pasted from an application not using strict
latin-1 (but Win-1252, MacRoman, etc.) to some wiki editing area,
it is badly transcoded by the Wiki-soft, resulting in many raw
quotation marks and <oe> ligatures being replaced by question marks.
Yes, I would be glad to be rid of the anoying question marks, squares or
diamonds.
Cons:
* any text containing non ASCII characters would increase
its weight : instead of one byte for a single <c with cedilla>, it
would require two; French uses lots of non ASCII characters, as
The diamonds that appeared in your original letter did not reproduce at
all when I quoted the letter for this answer. The c-cedilla is a part of
ISO-8859-1 and I enter it with Alt+0231. The two byte encoding should
not be a worry. It is unavoidable for all chinese characters.
Would it be possible, thus, to make utf-8 default
charset
for the French Wikipedia?
I believe that it should be the standard for all the Wikis.
Ec