Jitse Niesen wrote:
On 2/10/06, Brion Vibber <brion(a)pobox.com>
wrote:
We definitely care! The parser should never
produce invalid HTML output, if it
does that's a bug.
This relates to a problem that I'm having. At the moment, MediaWiki
relies on HTML Tidy to close HTML tags if necessary.
The optional use of tidy is a temporary hack that got put in rather than fix up
the parser's own HTML balancing code, which is a horrible piece of crap that I
wrote in 2002 when I didn't know what the hell I was doing. :)
If you'd like try rewriting it so it balances end tags properly, detects illegal
nesting cases, and understands MathML, that would be super awesome.
This is part of Sanitizer::removeHTMLtags() (which has two branches, one which
runs the old fixup code and one which leaves nesting issues for tidy to poke at
later).
-- brion vibber (brion @
pobox.com)