[Foundation-l] WikiX Image Synchronization Utility Posted

James Hare messedrocker at gmail.com
Sun Jul 2 16:03:23 UTC 2006


Also, the technical folk may not be a fan of having software that actually
downloads Wikipedia not via the dumps. While it's not as bad as downloading
all of Wikipedia not through the dumps, a lot has happened in eight months.

On 7/2/06, Jeffrey V. Merkey <jmerkey at wolfmountaingroup.com> wrote:
>
>
> Since the Images files on Wikimedia.org are in a constant state of flux
> and the last 80GB archive of images is 8 months old (made in
> November of 2005) I wrote a program in C that scans an XML dump of the
> English Wikipedia, then constructs and outputs
> a bash script that uses CURL in a non-obtrusive manner to download any
> missing images from Wikipedia commons and
> Wikipedia.  The program runs in background and is low impact.
>
>
> invoke from your /MediaWiki/images/ directory as:
>
> /wikix/wikix < enwiki-<date>-pages-articles.xml > image_sh &
> ./image_sh >& image.log &
>
> The program will invoke CURL (if loaded) after it outputs a full
> download script which will resync any remote wiki with the
> master images on Wikipedia and Wikipedia Commons.
>
> Enjoy.
>
> The source code  and makefiles are attached as text (since the program
> is small) and the tar gz is also available from
>
> ftp.wikigadugi.org/wiki/images
>
> Jeff V. Merkey
>
>
>
>
>
>
>
>
>
> all:  wikix
>
> wikix: wikix.c
>         gcc -g wikix.c -o wikix -lssl
>
> clean:
>         rm -f wikix
>
>
>
> _______________________________________________
> foundation-l mailing list
> foundation-l at wikimedia.org
> http://mail.wikipedia.org/mailman/listinfo/foundation-l
>
>
>



More information about the foundation-l mailing list