-----Original Message-----
From: wikitext-l-bounces(a)lists.wikimedia.org
[mailto:wikitext-l-bounces@lists.wikimedia.org] On Behalf Of
Steve Bennett
Sent: 27 November 2007 04:06
To: Wikitext-l
Subject: [Wikitext-l] Determining the behaviour of apostrophes
I've written up an account of how the current parser treats
apostrophes here:
http://www.mediawiki.org/wiki/Markup_spec/BNF/Inline_text#Dete
rmining_the_behaviour_of_apostrophes
All I've done is read the code of doAllQuotes() and translate
it from a procedural style (first replace blah, then iterate
through...) into a more declarative style (four apostrophes
end up getting rendered as X if the following is the case...).
The most interesting case is this one:
Take ''''four''' apostrophes and then throw
'''''five
unclosed apostrophes at them.
Normally, four apostrophes is treated as apostrophe followed by bold.
But when the parser finds unbalanced bold *and* italics on
the line, it goes looking for a bold to split. The first
bold, which is now preceded by an apostrophe, is seen as a
good candidate because it seems to be a single letter
followed by a bold (as in the l'''idee''
case). So that bold gets split *again*. Meaning that the four
apostrophes end up getting rendered as two apostrophes
followed by italics.
I suspect this was not planned behaviour.
Steve
Had an attempt to solve the bold/italics ambiguity last week. Managed to get
my handwritten parser to pass the bold/italics tests in parserTests.txt
(excluding anything with link markup as links parsing is still
unimplemented).
The code is still missing the searching for an single-letter preceeding a
bold to split at. Seems none of the tests exercise that particular bit of
code.
--
For
Take ''''four''' apostrophes and then throw
'''''five unclosed apostrophes
at them.
I get
<p>Take '<b>four</b> apostrophes and then throw ' five unclosed
apostrophes
at them.</p>
The ''''' becoming <b><i> and then getting balanced to
'<i/>, with the <i/>
being removed (causes problems in IE).
--
There is one test in parserTests.txt
!!test
Mixing markup for italics and bold
!! options
!! input
'''bold''''''bold''bolditalics'''''
!! result
<p><b>bold</b><b>bold<i>bolditalics</i></b>
</p>
!! End
That fails, due to having 6 apostrophes all being interpreted as markup.