I've been working off and on on a blogging script (in Perl), and I implemented
some wikimarkup in it. Under stress tests, it became apparent that I was
going about it the wrong way, as every time a page was viewed, the markup had
to be parsed.
My first thought was naturally to use some sort of caching system, but that
would take up an awful lot of space on my little server, and the speedup
wouldn't really justify it in my case.
What I did instead was parse it when it was saved, and then if I wanted to go
back and edit it, the script would simply "de-parse" it for presentation to
me in the editing form. A couple lines out of my script as an example:
$body =~
s/<em><strong>(.*?)<\/em><\/strong>/''''''$1'''''/g;
$body =~ s/<a href="(.*?)">(.*?)<\/a>/\[$1 $2\]/g;
By now, I was wondering how the Wikipedia software was handling this. Turns
out it's storing the wikimarkup, not the HTML, and parsing it in the viewing
code.
By now it's probably obvious where I'm going with this. Could one of these
methods (either storing a parsed and non-parsed version or the approach I
took with "de-parsing") be used for some performance gain on Wikipedia's
webserver?