From: "Brion Vibber" <brion(a)pobox.com>
On ven, 2002-02-22 at 02:33, Jan Hidders wrote:
>
> No, no, of course not. What I meant was that we check what the display
> character encoding (I said character set, but that is probably not the
right
word) is that
is given with the request. [....]
(Not necessarily, just lots of transliteration tables. Or perhaps
compile in iconv support... does iconv allow partial transliteration
between HTML entities and other character sets? ie, HTML entities that
are not in the destination charset are left intact?)
Ah, I didn't know iconv. My guess would be that it doesn't know about
entities because the default behaviour should be that it doesn't and I don't
see any flags to indicate that it should.
> The nice thing about this would be that you can
cut'n paste anything
from
any other
Wikipedia by cutting it from the edit box.
If I understand correctly, you're suggesting that the default character
encoding should *not* be based on the language used, but on some ability
of the browser to specify a preferred encoding (for instance, the HTTP
Accept-Charset header), such that the same user would see wikipedias in
different languages come up with the same character encoding?
Yes, and the accept-charset header is indeed the meta-tag I was thinking of.
I'm not convinced that the default value of that
would always (or even
often) be acceptable, and most users won't know how to change it.
Can you give an example? What is at the moment the behaviour of the
Esperanto Wikipedia?
Simple, obvious at first sight manual switching
between UTF-8 and a
standard transliteration format is a non-negotiable requirement for the
Esperanto 'pedia, so retaining the manual override is necessary.
Of course, users will still be able to choose their encoding (provided they
log in). Note that I was only talking about the editable boxes such as the
edit box and the search box. It makes sense to me to ask people to log in if
they want special behavior there. The encoding for presentation of pages is
another matter that can be decided separately.
> > cf $wikiRecodeInput(), $wikiRecodeOutput()
if you want a ready place
to
> > do this.
>
> It doesn't have the right arguments. But these are implementation
details.
We first
should agree on the architecture.
There's no character set argument because that's a global variable. At
present $wikiCharset specifies the default encoding (that used in the
database),
Ok, then it would be the right place. But note that for indexing reasons the
meaning of the variable that you gave can no longer be correct. The encoding
in the database *has* to be different from the representation in the edit
box and there may even be another encoding used for representing the page.
-- Jan Hidders