On 2/13/08, Daniel Kinzler <daniel(a)brightbyte.de> wrote:
No. If a tag-style extension wants to support wiki
text, it has to explicitly
invoke a new parser pass on the text contained between the tags. The text MUST
NOT be parsed/transformed before being passed to the extension, and what the
extension returns must not be parsed either (the latter is only partially true
for the current parser, but i would call that a bug, not a feature - see bug 8997).
So, the parse sequence for:
* <ref> '''blah'''</ref>
basically goes:
1. Parse bullet and find <ref>...</ref>
2. Pass <ref> chunk to extension.
3. Extension processes <ref> chunk, calls parser to process the bold
tags, returns something with <b>blah</b>
4. Parser continues on...
Magic words don't have to have the form __XXX__ -
they can be characterized by
any regular expression. Consider how ISBN and RFC are treated - those are magic
words too... Oh and please consider that the patterns are frequently localizable
No they're not. Quite specifically, they're not - the key words (ISBN,
RFC, PMID) are hardcoded into the parser code and not
internationalisable. I call them "magic links" in my grammar.
(and are thus maintained in mediawiki's messages
files): French, for example,
allows __AUCUNETABLE__ for __NOTOC__. The same goes for #REDIRECT btw: dutch
allows #DOORVERWIJZING, etc...
That's ok - I'd forgotten that the #REDIRECT word is a magic word though.
I'm not entirely sure if extensions are free to
define magic words using *any*
pattern, but I think this is so. MagicWord.php is entirely regex-based. Which
would mean that either your parser will only support some types of magic words,
or it needs a way to hook into the actual grammar.
Yes, as I discussed, there will need to be restrictions on the form of
magic words, which is not a bad thing anyway.
Oh, and "variables" like {{PAGENAME}} are
treated as magic words internally,
though that wouldn't have to be so. I would probably use the template mechanism,
and simply intercept the use of special names.
I'm a bit unclear on the meaning and current processing of the things
involving curly braces. Can someone help me out here:
* {{template}} - totally handled by preprocessor?
*{{{1}}} - template parameter, totally handled by preprocessor?
*{{PAGENAME}} - "magic" variable? Where is it handled? Does it have to be caps?
*{{foo:blah}} - parser function? Where is it handled?
*{{defaultsort:blah}} - same question
Any others?
Currently I'm handling these:
* __TOC__ etc (magic words)
* #REDIRECT
* ISBN, PMID, RFC (magic links)
Steve