Hello again,
In trying to retrieve the images for the Hebrew Wikipedia ZIM I'm making, I tried running Emmanuel's script mirrorMediawikiPages.pl. My command line was this:
./mirrorMediawikiPages.pl --sourceHost=he.wikipedia.org --destinationHost=localhost --useIncompletePagesAsInput --sourcePath=w
After working for more than 20 hours, and still in the stage of populating the @pages with incomplete pages, it aborted with "out of memory". The machine has 4GB physical memory, and the last time I checked -- several hours before it aborted -- the script was consuming 3.6GB.
Is there a way to do this in several large chunks, without specifying each individual page? How do you do it?
Thanks in advance,
Asaf Bartov