[Foundation-l] WikiX Image Synchronization Utility Posted

Jeffrey V. Merkey jmerkey at wolfmountaingroup.com
Sun Jul 2 19:15:13 UTC 2006


James Hare wrote:

>Also, the technical folk may not be a fan of having software that actually
>downloads Wikipedia not via the dumps. While it's not as bad as downloading
>all of Wikipedia not through the dumps, a lot has happened in eight months.
>  
>
I thought about this before posting it, but full disclosure is always 
the best policy. I asked Jim about
how to keep the images synced and he told me he was not aware of any 
tool that his technical folks
provided that would do this. So I wrote one that provides the ability to 
get the images in a LOW IMPACT
manner (the script is SLOW and takes a long time to sync) so it should 
be no more impact on the
wikimedia servers than a standard browser reading articles. The better 
solution is to keep the images
archive updated but I can see where this is not feasible since folks are 
always posting inappropriate images.

This lets me READ the image descriptions to determine which images are 
ok to use based on their licensing, then selectively download
just those images Wikimedia has approved.

Jeff

>On 7/2/06, Jeffrey V. Merkey <jmerkey at wolfmountaingroup.com> wrote:
>  
>
>>Since the Images files on Wikimedia.org are in a constant state of flux
>>and the last 80GB archive of images is 8 months old (made in
>>November of 2005) I wrote a program in C that scans an XML dump of the
>>English Wikipedia, then constructs and outputs
>>a bash script that uses CURL in a non-obtrusive manner to download any
>>missing images from Wikipedia commons and
>>Wikipedia.  The program runs in background and is low impact.
>>
>>
>>invoke from your /MediaWiki/images/ directory as:
>>
>>/wikix/wikix < enwiki-<date>-pages-articles.xml > image_sh &
>>./image_sh >& image.log &
>>
>>The program will invoke CURL (if loaded) after it outputs a full
>>download script which will resync any remote wiki with the
>>master images on Wikipedia and Wikipedia Commons.
>>
>>Enjoy.
>>
>>The source code  and makefiles are attached as text (since the program
>>is small) and the tar gz is also available from
>>
>>ftp.wikigadugi.org/wiki/images
>>
>>Jeff V. Merkey
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>all:  wikix
>>
>>wikix: wikix.c
>>        gcc -g wikix.c -o wikix -lssl
>>
>>clean:
>>        rm -f wikix
>>
>>
>>
>>_______________________________________________
>>foundation-l mailing list
>>foundation-l at wikimedia.org
>>http://mail.wikipedia.org/mailman/listinfo/foundation-l
>>
>>
>>
>>    
>>
>_______________________________________________
>foundation-l mailing list
>foundation-l at wikimedia.org
>http://mail.wikipedia.org/mailman/listinfo/foundation-l
>
>  
>




More information about the foundation-l mailing list