As a programmer with well over a quarter century of experience, I
applaud this effort by Nick to identify and slay the "dragon" of
N-squared performance.
When a process on a file takes time proportion to the file size, we call
that N. When time is proportion to log N, we call that log N performance
(quicksort, which uses a partition scheme and then sorts the partitions,
has log N performance).
A process whose elapsed time grows in proportion to the SQUARE of the
file size is out of control! It's called an N-squared or N^2 process,
and is to be avoided like the plague. Insertion sort and Bubble sort are
examples, and have long been shunned for all but small numbers of items.
Nick, if you and Magnus can pull this off, to Brion's satisfaction, you
will gain the everlasting gratitude of the entire Wikipedia community.
More power to you!
Ed Poor
Professional Software Developer
-----Original Message-----
From: wikitech-l-bounces(a)Wikipedia.org
[mailto:wikitech-l-bounces@Wikipedia.org] On Behalf Of Nick Pisarro
Sent: Wednesday, February 25, 2004 9:17 AM
To: Wikimedia developers
Subject: [Wikitech-l] Two projects I'm exploring
There are two projects that I am exploring that might make mediawiki
more robust in handling the emmense load being placed on it.
One, is a one or two pass wiki parser. The current parser, which
performs dozens of passes, probably degrades by the square of the file
size. I have added some thoughts to
http://meta.wikipedia.org/wiki/One-pass_parser
Storing diffs in the 'old' table. This would not affect performance,
except when loading or comparing old revisions, but could drastically
reduce the size of the database, which has to benefit how manageable it
is. There already is a differencing engine in the source, though I'm not
sure how reliable it is--it may also degrade by the square of the file
difference. Here too, a sequence of diffs can be merged in one pass.
Having written such code in the past, I plan to create a write up
exploring this idea. Has this idea been discussed amongst the
developers? What are the gotcha's?
Nick Pisarro
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)Wikipedia.org
http://mail.wikipedia.org/mailman/listinfo/wikitech-l