On Tue, 24 Apr 2007 17:22:57 -0600, Jeff V. Merkey
wrote:
Since I have open sourced the wikix program, now
anyone wanting the
images can download them directly from Wikipedia.
I am in process of restructuring the WikiGadugi site, so anyone wanting
the bittorrent downloads need to finish up this week,
as I will discontinue them shortly since folks now have the ability to
download them directly. The wikix program
is not very intensive on the main Wikimedia servers. The program is
setup to behave as several workstations, and it really
does not take that long to get the images.
I was under the impression that bulk downloads needed to be throttled, and
that it would take a lot longer than that to download everything. Does
this just grab the images as fast as it can get them? Is that allowed?
It's faster to get them from Wikipedia. The bittorrent downloads take
about 1 1.2 weeks to download the archive. Using wikix
directly only takes 1 1/2 days given the current size of the image set
for commons.
Getting them from Wikipedia is faster due to the squid caching both
locally and internet wide. My analysis of the
data sets from Wikipedia indicates that 60% of the images are cached
either locally on squid or at other remote
cache servers.
Since they are cached in a distributed manner, the program will only
access wikipedia intermittently. Copyvio is the bigger
issue than performance. My image mirroring has had almost no noticable
impact on Wikipedia with wikix. The program
behaves like 16 workstations, so Wikipedia seems to be able to handle it
with little additional overhead. Given the number
of squid servers Brion has active, I think the impact is minimal in
comparison to the massive amounts of access the site gets
daily.
Jeff
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l