Re: [Wikitech-l] MediaWiki converter (follow-up)

24 Mar 2006

Magnus,

...
  history. I could probably demonstrate by requesting
the history of GWB
 with limit=100000 a few dozen times simultanously, but I won't for
 obvious reasons. 
Instead of writing additional feature for that you could just  
restrict limit on page history :)

...
  Could you do me a favor? Run a SELECT DISTINCT on all
authors of GWB
 manually and take the time. This should represent the worst-case
 scenario. I'm curious about how long that takes. 
Worst case scenario is that revision would have to be loaded into  
memory, instead of safe haven on our slow i/o :) That would mean that  
we'd need twice more memory for all DB servers.

...
  There are ways to limit this. AFAIK, it is not
(strictly) necessary to
 list IPs in the author list, so one could add "WHERE rev_user_id>0" to
 the query. That should take care of the anon vandals, which in the  
 case
 of GWB should be quite a percentage. 
Not anymore. On the other hand, that'd require scanning some index  
anyway, :)

...
  It might also be possible to put an absolute limit
into the query. But
 that is a question the the legal guys. Would not help.

...
  Brion, please note that this is not about my latest
cute little  
 script.
 AFAIK, there is currently *no* way to publish the GWB article as
 demanded by the GFDL short of downloading the *entire* en.wikipedia
 including all old revisions, installing MediaWiki, importing the whole
 thing and then run the query locally. Not exactly what I'd call
 user-friendly... 
Maybe we should write author xml streams during backup process, but  
we should not solve this issue by running it all on our main db servers.

Domas

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] MediaWiki converter (follow-up)