On 11/9/07, Steve Sanbeg <ssanbeg(a)ask.com>
wrote:
But some constructs in MW require an FSM to
tokenize, not
a regex. Clearly, properly tokenizing bold/italics requires complex
processing on an entire paragraph of text. Even templates and links are a
little complex, but should be doable by maintaining states with a stack.
FSMs accept regular languages by definition, so the set of things an
FSM can recognize is precisely equal to that which can be specified by
a regex. :)
In fact regexes as seen in PHP etc are more powerful than FSMs, since
they can include back references and suchlike. But I presume PHP
compiles regexes down to efficient FSMs if they don't include such
constructs, so it probably doesn't make much difference in performance
terms.
Soo Reams