> (Nick Reinking <nick(a)twoevils.org>)g>):
>
> I'm actually in the middle of a C project to reduce the wikitext
> parser to a two-pass parser...
Just to update everybody on my progress with the C wikitext parser:
To do:
* Lists of any sort
Done:
* Ignores <math>
* Converts < > and & inside <nowiki>
* <pre> (space at beginning of line)
* <hr> (---- at beginning of line)
* Sections, subsections, and subsubsections (==, ===, and ====
respectively)
* Emphasis, strong emphasis, and very strong emphasis ('', ''', and
''''')
* {{CURRENTMONTH}}, {{CURRENTDAY}}, {{CURRENTYEAR}}, {{CURRENTTIME}}
* Basic links (http://, ftp://, gopher://, news://, etc.)
* Complex basic links ([http://... Blah Blah]
Possibly later:
* ISBN lookups
* Handle <math> conversion
Must be done by PHP:
* Handle links / link lookup
* Ignore links in <nowiki>
* ~~~ and ~~~~
* {{NUMBEROFARTICLES}}, {{CURRENTMONTHNAME}}, {{CURRENTDAYNAME}}
Couple quick questions:
When Wikitext is pulled from the database, what are the newlines?
Are they always \n? If so, I can clean up the parsing a bit and eek a
bit more performance out (not a big deal). Also, what format is the
wikitext stored in the database as? UTF-8? UTF-16?
As far as performance goes, with what I'm handling now, with all the
.txt data files in the testsuite (x256 = 492672 lines), I'm seeing
parsing speeds of about 86600 lines/sec (in an 18KB executable).
--
Nick Reinking -- eschewing obfuscation since 1981 -- Minneapolis, MN