On 2/11/06, Brion Vibber <brion(a)pobox.com> wrote:
The optional use of tidy is a temporary hack that got put in rather than fix up
the parser's own HTML balancing code, which is a horrible piece of crap that I
wrote in 2002 when I didn't know what the hell I was doing. :)
If you'd like try rewriting it so it balances end tags properly, detects illegal
nesting cases, and understands MathML, that would be super awesome.
Hmm, I think it would be easier for me to change tidy so that it
understands MathML. Why do you want to get rid of tidy?
The MathML is generated by ourselves and is not under control of the
user. Hence, we can assume that it is correct, and in principle, it
does not need to go through the sanitizer (this also holds for the
HTML generated by Parser.php, for instance in wikitables). However,
the search-and-replace design of Parser.php makes it impossible to
separate the tags generated by the MediaWiki and the tags included by
the user in the wikitext, which is why everything has to be run
through the sanitizer. Is this correct?
I'm asking to make sure that I understand how
Santizer::removeHTMLtags() ties in with the rest of the code. I do not
want to criticize the design of the parser; it must be good because it
runs the best website of the world.
Cheers,
Jitse