[Labs-l] Google bot

Federico Leva (Nemo) nemowiki at gmail.com
Sat Oct 25 08:36:55 UTC 2014


As Nuria, Billinghurst and others said, the tools are expected to be 
discoverable. It's easy enough not to throw away the baby with the 
bathwather*.
* Dynamic pages generally have some URL parameters, usually indicated by 
?. In the general robots.txt, disallow Googlebot and friends** to crawl 
those, with appropriate wildcards, as per 
https://support.google.com/webmasters/answer/6062596
* If it's not enough, add URL patterns with several /
* If it's not enough, reduce the global crawl-delay (apparently not 
possible per-folder) https://support.google.com/webmasters/answer/48620
* If it's not enough, at the very least the main page for each tool 
should be crawled, disallowing at most //tools.wmflabs.org/*/*

Nemo

(*) Even Toolserver managed, with way less resources.
(**) But not ia_archiver if at all possible, please.



More information about the Labs-l mailing list