Re: [Wikitech-l] Re: Chat about Wikipedia performance?

8 May 2003

...
  > (Nick Reinking &lt;nick(a)twoevils.org&gt;)g>):
 > 
 > I'm actually in the middle of a C project to reduce the wikitext
 > parser to a two-pass parser... 
Just to update everybody on my progress with the C wikitext parser:

To do:
* Lists of any sort

Done:
* Ignores <math>
* Converts < > and & inside <nowiki>
* <pre> (space at beginning of line)
* <hr> (---- at beginning of line)
* Sections, subsections, and subsubsections (==, ===, and ====
  respectively)
* Emphasis, strong emphasis, and very strong emphasis ('', ''', and
  ''''')
* {{CURRENTMONTH}}, {{CURRENTDAY}}, {{CURRENTYEAR}}, {{CURRENTTIME}}
* Basic links (http://, ftp://, gopher://, news://, etc.)
* Complex basic links ([http://... Blah Blah]

Possibly later:
* ISBN lookups
* Handle <math> conversion

Must be done by PHP:
* Handle links / link lookup
* Ignore links in <nowiki>
* ~~~ and ~~~~
* {{NUMBEROFARTICLES}}, {{CURRENTMONTHNAME}}, {{CURRENTDAYNAME}}

Couple quick questions:
When Wikitext is pulled from the database, what are the newlines?
Are they always \n?  If so, I can clean up the parsing a bit and eek a
bit more performance out (not a big deal).  Also, what format is the
wikitext stored in the database as?  UTF-8?  UTF-16?

As far as performance goes, with what I'm handling now, with all the
.txt data files in the testsuite (x256 = 492672 lines), I'm seeing
parsing speeds of about 86600 lines/sec (in an 18KB executable).

-- 
Nick Reinking -- eschewing obfuscation since 1981 -- Minneapolis, MN

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Re: Chat about Wikipedia performance?