Le Vendredi 10 Décembre 2004 01:27, Jamie Bliss a écrit :
First of all, check Bugzilla for this issue.
i
do now.
On Thu, 9 Dec 2004 19:43:08 +0100, Baeckeroot alain
<al2.baeckeroot(a)laposte.net> wrote:
> Hello
>
> I m still trying to rebuild my french wikipedia for local use and i have pbs
> while trying to rebuild the "brokenlinks" and "categorylinks"
tables, using
> the php script in the maintenance directory of mediawiki-1.3.8:
.../...
Yes, there seems to be a LARGE memory leak. It appears that the script
forgets to call wfFreeResult when it's done with it. (Though that is
from a very short scan of the files).
I can not say what to do exactly (I am not that familar with the
code), but my recomendation would be to place a call at about line 12.
(at least, obviously much more is needed).
I'll try to put a call to
wfFreeResult .
I have tried to unset( $wgLinkCache ); inside or outside the loop, and it doesn't
work the way i hope ;)
(I was a fortran programmer, i wrote my first php this week end ... I find
it difficult to understand the maintenance scripts. I'll watch the main
script later ;)
It's also possible that the request involves a huge number of pages
that would take 64M. (And with Wikipedia, that is possible.)
Yes, if you launch the
script it starts at one and go to the end, something
like 100 000 articles :) and there is a perpetually growing cache.
As a workaround, i wrote a small bash script which truncate the task in
chunks and launch one or more "refreshLinks.php $start $end"
eventually in parralel queues
I "benchmarked" that for cur_id = 1 to 5000 and it ALWAYS takes 30 min:
(on a mono CPU machine)
- 40 scripts in parallel , with chunks of 25 (cpuload grows to 16-17)
- 1 script with a big block of 5000 (cpuload 2)
- 10 in // with chunks of 250 (cpuload goes to 5 )
I would say the cache can be flushed after each article, as it gives
no speed improvement for consecutive runs.
(it a complicated pb it think. from the first 1% article, you
have link_to 60 % of the database)
It call a "dumbUpdate", i got a 30% speed improvement, by just
suppressing the refresh of the timestamps. (i ll put them after ;)
-- Jamie
-------------------------------------------------------------------
http://endeavour.zapto.org/astro73/
Thank you to JosephM for inviting me to Gmail!
thanks
Alain