Re: [Wikitech-l] MediaWiki converter (follow-up)

24 Mar 2006

Domas Mituzas schrieb:
...
  Magnus,

  history. I could probably demonstrate by
requesting the history of GWB
 with limit=100000 a few dozen times simultanously, but I won't for
 obvious reasons.

 Instead of writing additional feature for that you could just  
 restrict limit on page history :)
    I'm not sure I follow the joke here.
Yes, obviously we have to limit the limit (sorry;-) on history.
My feature costs much less in time of database access and, especially,
rendering.
...

  Could you do me a favor? Run a SELECT DISTINCT on
all authors of GWB
 manually and take the time. This should represent the worst-case
 scenario. I'm curious about how long that takes.

 Worst case scenario is that revision would have to be loaded into  
 memory, instead of safe haven on our slow i/o :) That would mean that  
 we'd need twice more memory for all DB servers.
    I don't follow this either, sorry. Until we have a concrete number on
how long the GWB page (or a similar one) takes for that query,
speculation if futile IMHO.
...

  There are ways to limit this. AFAIK, it is not
(strictly) necessary to
 list IPs in the author list, so one could add "WHERE rev_user_id>0" to
 the query. That should take care of the anon vandals, which in the  
 case
 of GWB should be quite a percentage.

 Not anymore. On the other hand, that'd require scanning some index  
 anyway, :)
    There's still be plenty of anons from before protection.
But, if it isn't cheaper anyway, OK.
...

  It might also be possible to put an absolute
limit into the query. But
 that is a question the the legal guys.
       Would not help.
    OK.
...

  Brion, please note that this is not about my
latest cute little  
 script.
 AFAIK, there is currently *no* way to publish the GWB article as
 demanded by the GFDL short of downloading the *entire* en.wikipedia
 including all old revisions, installing MediaWiki, importing the whole
 thing and then run the query locally. Not exactly what I'd call
 user-friendly...

 Maybe we should write author xml streams during backup process, but  
 we should not solve this issue by running it all on our main db servers.

Well, not on out "main" (=master) servers; it doesn't require writes.
Or do you, with "main", mean the slave servers? If so, what other
database servers besides "main" do we have?
If there would be a database/apache server group dedicated to a (future)
API in general, that would be great indeed !

And the author information will be outdated as soon as someone edits the
article after the backup.
Or do you mean inside the backup XML stream, for the "current version"
dump? That would be an improvement, but no solution to aveage Joe who
"wants these three pges as PDF".

Magnus

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] MediaWiki converter (follow-up)