For context: I've been working on replacing the html5 and jsdom modules
(which depend on the native 'contextify' module) with the pure-javascript
'domino' implementation of DOM4. This seems to be faster, cleaner, and fix
some bug caused by jsdom's eccentric DOM handling. Domino is (in my brief
experience) more reliable and standards-compliant.
Here's a list of issues I came across in the process:
* There were 3 new failures in wt2html tests. (There were also some new
passes, so the number of correct tests increases on net.) They are:
1) "expansion of multi-line templates in attribute values (bug 6255 sanity
check 2)"
For reference, this test looks like:
!! test
Expansion of multi-line templates in attribute values
(bug 6255 sanity
check)
!! input
<div style="background:
#00FF00">-</div>
!! result
<div style="background: #00FF00">-</div>
!! end
!! test
Expansion of multi-line templates in attribute values (bug 6255 sanity
check 2)
!! input
<div style="background: #00FF00">-</div>
!! result
<div style="background: #00FF00">-</div>
!! end
I'm not sure how this test ever passed in jsdom -- the inputs here are
actually identical to an HTML parser, since hex-escape decoding happens
very early. But apparently the wikitext parser should defer processing of
the 
 somehow? On the domino branch our HTML serialization now uses the
upstream standard HTML5-serialization algorithm, which doesn't escape
newlines. (
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#se…)
Note that the first test also involves whitespace normalization, which the
PHP parser does (see
https://www.mediawiki.org/wiki/Special:Code/MediaWiki/14689) but parsoid
does not do. (I've got a patch to do whitespace normalization in parsoid
if there's interest, but it causes other tests to break.)
What's the plan to handle cases like this? Is it really important to
generate the in the output?
2) "Play a bit with r67090 and bug 3158"
This is a parsoid-only test which looks like:
!! test
Play a bit with r67090 and bug 3158
!! options
disabled
!! input
<div style="width:50% !important"> </div>
<div style="width:50% !important"> </div>
<div style="width:50% !important"> </div>
<div style="border : solid;"> </div>
!! result
<div style="width:50% !important"> </div>
<div style="width:50% !important"> </div>
<div style="width:50% !important"> </div>
<div style="border : solid;"> </div>
!! end
In standard HTML serialization,   is encoded uniformly as so
even if you wanted to be bug-compatible with the 'border :' style, you
should be emitting a not a   there. The other two cases are
whitespace normalization within attributes (again). I'm guessing jsdom
(incorrectly) did this by default whether you wanted it or not; you need to
explicitly add attribute-normalization into the domino case if that's
desired. (But there's some other reason why the 'border :' case is failing
now which needs to be chased down, unrelated to the   vs issue.)
3) "Parsoid-only: Table with broken attribute value quoting on consecutive
lines"
!! test
Parsoid-only: Table with broken attribute value quoting on consecutive
lines
!! options
disabled
!! input
{|
| title="Hello world|Foo
| style="color:red|Bar
|}
!! result
<table>
<tr>
<td title="Hello world">Foo
</td><td style="color: red;">Bar
</td></tr></table>
!! end
jsdom used to insert the extraneous semicolon at the end of the 'style'
attribute. domino does not. I believe this test case is broken and the
extraneous semicolon should be removed.
* Other observed bugs & failures:
http://parsoid.wmflabs.org/en/Pi gives:
TypeError: Cannot assign to read only property
'ksrc' of #<KV>
at AttributeExpander._returnAttributes
(/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/ext.core.AttributeExpander.js:71:20)
at AttributeTransformManager.process
(/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.TokenTransformManager.js:1017:8)
at AttributeExpander.onToken
(/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/ext.core.AttributeExpander.js:46:7)
at AsyncTokenTransformManager.transformTokens
(/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.TokenTransformManager.js:568:17)
at AsyncTokenTransformManager.onChunk
(/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.TokenTransformManager.js:356:17)
at SyncTokenTransformManager.EventEmitter.emit (events.js:96:17)
at SyncTokenTransformManager.onChunk
(/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.TokenTransformManager.js:904:7)
at PegTokenizer.EventEmitter.emit (events.js:96:17)
at PegTokenizer.process
(/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.tokenizer.peg.js:88:11)
at ParserPipeline.process
(/home/cananian/Projects/OLPC/Narrative/mediawiki/Parsoid/js/lib/mediawiki.parser.js:360:21)
http://localhost:8000/simple/Game gives:
starting parsing of Game
*********** ERROR: cs/s mismatch for node: A s: 3808; cs: 3821 ************
completed parsing of Game in 1491 ms
* [[File:]] tag parsing for images appears to be incomplete:
a) alt= and class= are not parsed
b) 'thumb' and 'right' should result in <img class="thumb
tright" /> or
some such, but there doesn't appear to be an indication of either option in
the parsoid output.
* I'd like to see title and revision information in the <head>
* Interwiki links are not converted to relative links when the "interwiki"
is actually the current wiki. (Maybe this isn't really a bug.)
Let's discuss these a bit and I'll file bugzilla tickets for the bits we
can agree are actually bugs. ;)
--scott
--
(
http://cscott.net/ )