On 07/13/2017 02:18 AM, Nicolas Vervelle wrote:
I think I've found some discrepancy between Linter reports. On frwiki, the
page "Discussion:Yasser Arafat" is reported in the list for self-closed-tag
[1], but when run the text of the page through the transform API [2], I
only get errors for obsolete-tag and mixed-content and nothing for
self-closed-tag.
When I pasted the wikitext for Discussion:Yasser_Arafat page in the
wikitext box AND entered the page title in the title box on
https://fr.wikipedia.org/api/rest_v1/#!/Transforms/post_transform_wikitext_…,
I do see the following among others:
...
|{ "type": "self-closed-tag", "params": { "name":
"span" }, "dsr": [
183063, 183134, null, null ], "templateInfo": { "name":
"Modèle:Censuré"
} },|
...
However, if I don't add the page title in the title box, I can reproduce
your problem ... so, clearly this is something to do with a template
depending on the page title.
I can reproduce this on the commandline with the specific wikitext
substring that the Linter interface shows you. This output below shows
that the linter error is dependent on having the page title there.
---
[subbu@earth parsoid] echo '{{Censuré|Tu remarqueras que je ne te
retourne pas la question.<br />}}' | parse.js --page
Discussion:Yasser_Arafat --prefix frwiki --lint > /dev/null
[info/lint/self-closed-tag][frwiki/Discussion:Yasser_Arafat]
{"type":"self-closed-tag","params":{"name":"span"},"dsr":[0,71,null,null],"templateInfo":{"name":"Modèle:Censuré"}}
[info/lint/stripped-tag][frwiki/Discussion:Yasser_Arafat]
{"type":"stripped-tag","params":{"name":"SPAN"},"dsr":[0,71,null,null],"templateInfo":{"name":"Modèle:Censuré"}}
[subbu@earth parsoid] echo '{{Censuré|Tu remarqueras que je ne te
retourne pas la question.<br />}}' | parse.js --prefix frwiki --lint >
/dev/null
[subbu@earth parsoid]
---
When I add a --dump tplsrc flag to parsoid (which you can also get by
using the expandtemplates action api endpoint), I see the following:
---
<span class="censure" style="background-color:#EEF;color:#EEF;"
title="Tu remarqueras que je ne te retourne pas la question.<br
/>"><span style="visibility:hidden">Tu remarqueras que je ne te
retourne
pas la question.<br /></span></span>
---
So, it looks like Parsoid's tokenizer is tripping on the /> that is
present in the span title attribute and false assumes it is a
self-closing tag.
In any case, in conclusion:
(1) Please provide page title when you use the API
(2) There is a Parsoid bug in detection of self-closing tags where
presence of a "/>" in an HTML attribute triggers a false positive. This
has been reported previously ... so I suppose it is not as uncommon as I
thought. We'll take a look at that.
Subbu.