[Wikipedia-l] Recent Changes layout

Jan.Hidders hidders at uia.ua.ac.be
Fri May 31 11:45:57 UTC 2002


On Thu, May 30, 2002 at 03:24:14PM -0700, lcrocker at nupedia.com wrote:
> Axel makes a good argument that the new code's simplified layout of the
> Recent Changes page (see
> http://www.piclab.com/newwiki/wiki.phtmltitle=Special:Recentchanges )
> really is an important loss of functionality.  Since his comment, I added
> an article history link, but I think his comment still applies in general. 
> I think it is of critical importance that this feature be optimized both
> for function and for speed.  I don't want to put the old code back because
> it's a dog, but if I can find a way to implement the needed features in a
> faster way, I'm all for it.

I'd really like to help but I'm very busy at the moment and next week
I'll be in Madison Wisc. attending a conference. So I can only make a few
quick remarks here.

First I would check if the old SQL code was really that bad. Just put it
side by side with the new SQL in your code and time it. A lot depends here
on how MySQL optimizes this query and uses the indices, but I am now at work
and cannot check with EXPLAIN how MySQL actually deals with it.

> First, I'd like to solicit an opinion from Jan or other database gurus
> about what might make the speed better.  Currently, it is driven by a
> single SELECT from the current article table, which is sorted by reverse
> timestamp (on which there's an index), and with a LIMIT.  The old (N
> changes) feature also required accessing the old versions table and an
> expensive GROUP BY, which I'd like to avoid if possible.

Grouping on an indexed column is not always expensive. Again, it depends on
how MySQL optimizes the query.

> One way I might do that is to create a new "changelog" table, updated 
> on each page change.

Also here it depends upon the details of how MySQL stores its records with
fields such as those of type text.  It could very well be that it only needs
one disk read per record to retrieve the necessary information from the
'old' table, in which case the extra table will not make a difference. (In
fact if MySQL is really smart it could do it only with the indices.) Also
here I would suggest to simply experiment a litte. It's quite easy to create
once the redundant table and then try a few queries and time them and
compare them to the original ones.

> Jan, would the increase non-atomicity of the article save process be a
> problem?  MySQL doesn't have transactions, so it's possible that either an
> aritlce might be saved while recording that fact in the changelog fails,
> or vice versa.

That chance is very small if you put the SQL statements very close to
eachother. We are already non-atomic as it is because when a page is changed
we do a delete and two inserts, and then I'm not even talking about the work
that is done for the linked and unlinked tables. As long as the updates of
cur and old go right you can always regenerate the changelog with a simple
SELECT over old.

-- Jan Hidders



More information about the Wikipedia-l mailing list