On 11/28/07, David Gerard <dgerard(a)gmail.com> wrote:
Steve Bennett has been writing a parser grammar, and
investigating how
the present parser *actually* works.
Turns out the apostrophe-italic combination only works once a para. Is
this expected?
To clarify, this behaviour (converting exactly one occurrence of three
apostrophes to apostrophe+italics if the paragraph as a whole has
mismatched italics/bold) is pretty evident from looking at the code:
# If there is a single-letter word, use it!
if ($firstsingleletterword > -1)
{
$arr [ $firstsingleletterword ] = "''";
$arr [ $firstsingleletterword-1 ] .= "'";
}
So, the writer of this code (Magnus?) definitely knows about this
limitation. The question is really:
1) Does anyone really use this construct? We've heard that the French
use a curved apostrophe instead of the straight one in this situation.
It's hard to believe anyone relies on it as it's so flaky: once per
paragraph only? Eep.
2) Can it either be removed from the current parser or not implemented
in the spac/future parser?
It's particularly noxious as there is no way to parse it in any
reasonable fashion. Four apostrophes is always apostrophe+bold
(parseable), except that this rule means that if at the end of the
paragraph you encounter other unclosed italics and bold, you have to
go back to the start and convert one of these new "apostrophe+bold"
sequences into "apostrophe+apostrophe+italics" (nightmare).
I should also point out that whenever this situation (bold and italics
both unbalanced) arises, the parser always attempts to recover by
converting a bold into an italics, not just if there is a single
letter word - that's just the one it splits first.
Steve
(not subscribed to foundation-l)