On Sat, Jul 18, 2009 at 6:20 AM, David Gerard<dgerard(a)gmail.com> wrote:
2009/7/18 Alexandre Dulaunoy <a(a)foo.be>be>:
I was wondering if it would be possible to allow
web robots to access
http://upload.wikimedia.org/wikipedia/commons/ to gather and mirror
the media files. As this is pure HTTP, the mirroring could benefit from
the caching mechanisms of HTTP object (instead of having a large dump
containing all the media files, that is more difficult to cache/update).
I see lots of files on
upload.wikimedia.org on Google Image Search
already. Is that actually forbidden by our robots.txt?
It'd actually be better if Google properly indexed text pages whose
name ends in .jpg or whatever ... but they're aware we'd like that, so
it's up to them.
Which is why my personal wiki is patched to translate the ".jpg" into
"_jpg", etc. for all references to image description pages.
-Robert Rohde