Magnus,
history. I could probably demonstrate by requesting
the history of GWB
with limit=100000 a few dozen times simultanously, but I won't for
obvious reasons.
Instead of writing additional feature for that you could just
restrict limit on page history :)
Could you do me a favor? Run a SELECT DISTINCT on all
authors of GWB
manually and take the time. This should represent the worst-case
scenario. I'm curious about how long that takes.
Worst case scenario is that revision would have to be loaded into
memory, instead of safe haven on our slow i/o :) That would mean that
we'd need twice more memory for all DB servers.
There are ways to limit this. AFAIK, it is not
(strictly) necessary to
list IPs in the author list, so one could add "WHERE rev_user_id>0" to
the query. That should take care of the anon vandals, which in the
case
of GWB should be quite a percentage.
Not anymore. On the other hand, that'd require scanning some index
anyway, :)
It might also be possible to put an absolute limit
into the query. But
that is a question the the legal guys.
Would not help.
Brion, please note that this is not about my latest
cute little
script.
AFAIK, there is currently *no* way to publish the GWB article as
demanded by the GFDL short of downloading the *entire* en.wikipedia
including all old revisions, installing MediaWiki, importing the whole
thing and then run the query locally. Not exactly what I'd call
user-friendly...
Maybe we should write author xml streams during backup process, but
we should not solve this issue by running it all on our main db servers.
Domas