Domas Mituzas schrieb:
Magnus,
history. I could probably demonstrate by
requesting the history of GWB
with limit=100000 a few dozen times simultanously, but I won't for
obvious reasons.
Instead of writing additional feature for that you could just
restrict limit on page history :)
I'm not sure I follow the joke here.
Yes, obviously we have to limit the limit (sorry;-) on history.
My feature costs much less in time of database access and, especially,
rendering.
Could you do me a favor? Run a SELECT DISTINCT on
all authors of GWB
manually and take the time. This should represent the worst-case
scenario. I'm curious about how long that takes.
Worst case scenario is that revision would have to be loaded into
memory, instead of safe haven on our slow i/o :) That would mean that
we'd need twice more memory for all DB servers.
I don't follow this either, sorry. Until we have a concrete number on
how long the GWB page (or a similar one) takes for that query,
speculation if futile IMHO.
There are ways to limit this. AFAIK, it is not
(strictly) necessary to
list IPs in the author list, so one could add "WHERE rev_user_id>0" to
the query. That should take care of the anon vandals, which in the
case
of GWB should be quite a percentage.
Not anymore. On the other hand, that'd require scanning some index
anyway, :)
There's still be plenty of anons from before protection.
But, if it isn't cheaper anyway, OK.
It might also be possible to put an absolute
limit into the query. But
that is a question the the legal guys.
Would not help.
OK.
Brion, please note that this is not about my
latest cute little
script.
AFAIK, there is currently *no* way to publish the GWB article as
demanded by the GFDL short of downloading the *entire* en.wikipedia
including all old revisions, installing MediaWiki, importing the whole
thing and then run the query locally. Not exactly what I'd call
user-friendly...
Maybe we should write author xml streams during backup process, but
we should not solve this issue by running it all on our main db servers.
Well, not on out "main" (=master) servers; it doesn't require writes.
Or do you, with "main", mean the slave servers? If so, what other
database servers besides "main" do we have?
If there would be a database/apache server group dedicated to a (future)
API in general, that would be great indeed !
And the author information will be outdated as soon as someone edits the
article after the backup.
Or do you mean inside the backup XML stream, for the "current version"
dump? That would be an improvement, but no solution to aveage Joe who
"wants these three pges as PDF".
Magnus