[Wikitech-l] db schema change and scripts

19 Dec 2004

I know I'm a bit late with this, but it is pretty hard to keep in touch with
everything going on.
I'm trying to grasp what will be the consequences of the db schema change
for the wikistats scripts, which use raw database dumps.

I need to know which user edits in Revision were for namespace 0.
Also which entries in Text belong to the same article, were in namespace 0
and at what time they were saved.

The new setup seems to imply I need to build huge tables which exceed phys.
memory,
hence sharp performance penalties (the job already runs +/- 24 hrs),
or I need to sort and merge these huge files several times before real work
starts.

If I understand correctly I would have to sort Page and Revision on
page_id=rev_page and merge into a new file, say PageRev.
Then sort PageRev file on rev_id and merge with Text.

All of his would not be necessary if a few small fields were replicated
across tables.
Impact on db size would be trivial, on page save time zero.

Namespace -> Revision.
Namespace, Rev_Page, Timestamp -> Text.

---------

Unrelated, will there be a periodic (costly) query to produce something
similar to the cur dump, which is used by quite a few scripts.
Downloading all complete db's is not workable.

Erik Zachte

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] db schema change and scripts