"Brion Vibber" skribis:
I didn't say it was being cached, that its content
could be
word-searched, or that it had been spidered through to other pages. I
said it was *indexed*. Now, maybe Google uses some word other than
"indexed" to mean "contained in a database of links which are shown to
users when they search for words contained in the link". I'll buy that.
Maybe the word they use is "florble". In that case, the page is being
florbled despite our best efforts to stop it from being florbled.
Is there any way we can tell google not to florble pages that are
explicitly excluded by our robots.txt file so that people will stop
complaining to *us* about google's overzealous florbling?
As I understand it:
The problem is that there are two parts of GoogleBot.
First step is collecting URLs and adding it to their
database, without doing any checking of it, nor
retrieving the page. This step actually uses nor
robots.txt nor meta-noindex of the given links.
meta-nofollow of the page containing the links
is probably used.
The second step (which can occur some weeks later)
is taking URLs from their database, and retrieve
the page. When they are excluded in the respective
robots.txt or by a meta-noindex, they are deleted
from the database.
(At the same time, step one is done with the links
on this page).
Between those two steps, the url stays in the
database, and whenever it contains the search-words
(in the url itself) it is shown as a search result.
Hypothetically we could jimmy the page to not produce
edit links if the
user agent is googlebot, but that would be very annoying for several
reasons:
1) The google-cached page would be missing those links.
2) This would screw with page caching. Google hits a lot of pages, and
we'd have to either not cache any of its hits or be very careful in
coding around it.
What about changing the edit urls, so that they don't
contain anything, which people would search?
For example
http://pl.wikipedia.org/w/wiki.phtml?title=W.i.b.r.a.t.o.r&action=edit
or
http://pl.wikipedia.org/w/wiki.phtml?articlenum=12345678&action=edit
Paul