------------ Původní zpráva ------------
Od: Brion Vibber <brion(a)wikimedia.org>
Předmět: Re: [Wikitech-l] Anchors haven't id attribute
Datum: 26.12.2008 06:30:00
----------------------------------------
On 12/25/08 4:32 AM, Danny B. wrote:
I have reverted both revisions in r45021 and
r45022 because it caused massive
invalidity of pages.
Given that we've been outputting these as "id" attributes for the last
few years already (as output by Tidy), I have reverted your revert in
r45044 pending further discussion.
-- brion
Well, the id was added _only_ to those tags, where name was transferable to id - thus had
to start with ASCII letter. _Never_ to those, which did not conform this rule (the regexp
mentioned in my previous post). Easily provable by either running older revision of
MediaWiki or testing in Tidy directly:
Take this code excerpt (and wrap it with minimal XHTML document stuff) and run it through
Tidy:
<a name="X"></a><h2> <span
class="mw-headline"> X </span></h2>
<a name="1X"></a><h2> <span
class="mw-headline"> 1X </span></h2>
<a name=".C3.81X"></a><h2> <span
class="mw-headline"> ÁX </span></h2>
<a name="-X"></a><h2> <span
class="mw-headline"> -X </span></h2>
The result will be:
<a name="X" id="X"></a><h2><span
class="mw-headline">X</span></h2>
<a name="1X"></a><h2><span
class="mw-headline">1X</span></h2>
<a name=".C3.81X"></a><h2><span
class="mw-headline">ÁX</span></h2>
<a name="-X"></a><h2><span
class="mw-headline">-X</span></h2>
Now, let me repeat, how the "id" is defined:
1: XHTML is reformulation of HTML 4 as an XML 1.0 application.
2: That means it takes every single definition from HTML 4 and keeps it unless it is
overriden in XHTML.
3: The id and name has been defined in HTML 4 as /[A-Za-z][A-Za-z0-9:_.-]*/ [1] [2]
4: The name has been redefined to NMTOKEN [2] [3]
5: The id has never been redefined thus stays on definition mentioned in point 3 above.
This is how the id in XHTML was always handled since the XHTML is out. I also think that
such important thing like handling of id is, was fixed in validator during so many years
if it wasn't correct.
So currently, all non-latin-chars wikis are now totally invalid according to W3C
validator. Major parts of non-ASCII-chars wikis are invalid as well. Therefore is very
hard to find other invalid mistakes in code when having worthless positives on every other
page. :-(
Also one thing at the end: I think that the current rendering with controversial ids
brought more negatives (such as much lowering down the ability to find the real invalid
parts of the code) than positives - well, it was working correctly before, so what benefit
it actually brought? On the other hand it brought this controversy.
I take the point that I (and majority of people over the world, the validator, Tidy and so
many other tools etc.) _may_ be wrong with the interpretation of definition of id. But I
guess unless the authority tools, as validator or Tidy are, are fixed in this issue - thus
can be proved we render the page correctly - we should not render that way. As I mentioned
above - it was working correctly before so there is no urge to force the new rendering
since it is not correcting any mistake or misfunctionality.
[1]
http://www.w3.org/TR/html401/types.html#type-name
[2]
http://www.w3.org/TR/xhtml1/#C_8
[3]
http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Nmtoken
Kind regards
Danny B.