Mark Jaroski wrote:
Hi all,
I hope that this isn't too out of left field, especially coming from a
relative unknown. I do however think that this is a conceptual problem in
the Parser class which ought to receive some attention eventually.
You see, it seems to me that the Parser is in fact more Renderer than it is
parser, and most of the options in ParserOptions are in fact Rendering
options.
It's not a parser at all, a fact which I was aware of when I named the
class in MediaWiki 1.2 (it was OutputPage::addWikiText before that). But
"parser" was a nice short name which was at least vaguely related to its
purpose, and since people with a computer science background compulsively
refer to everything that processes text as a parser, that part of the
codebase had already been referred to as such on occasion.
I noticed this while working on a tag extension for
events. My goal was to
parse some wikitext to obtain an array of PHP objects of a class Event
which cooresponds to a an "event" tag I've defined. Well, I can sort of
pull this off by creating a special parser, and adding parser hook which
builds the objects for each array and sticks them into a global. Of course
to do this I need to initialize the parser with a ParserOptions object, and
of course the parser goes on to render HTML which I don't actually want,
since the goal is just to get the intermediate stage.
I'm not really clear on what you're trying to do, but maybe it can be
achieved by only calling the parts of the parser that you need. There's
more to the Parser class than parse().
It occurs to me that separating the Parse stage from
the Render stage could
have some other useful effects, like making it easier to add different
renderers, and making it a bit easier to start on the Parser
rationalizations that people have been talking about.
Of course any such separation will mean having to specify at least some of
the parser and renderer behaviour, but maybe not all. At any rate it
should make it easier to do so.
So my question is this: does separating the two functions seem like a
worthwhile task to anybody else here?
It could be useful, but be assured that it's the most difficult way
possible to do whatever it is you're trying to do. There's a huge quantity
of code involved, and there are stringent performance and backwards
compatibility requirements if it's going to be used on Wikimedia. It's not
a project for the faint-hearted.
At least it's a better proposal than the one to have XML as an
intermediate format.
-- Tim Starling