On Sun, 2005-11-20 at 18:39 -0800, Brion Vibber wrote:
Rob Lanphier wrote:
On Fri, 2005-11-18 at 19:09 +0000, Timwi wrote:
Speaking of which - this reminds me of an idea I
had a while ago and I
was wondering if anyone would be interested to hear this. Currently many
Wikipedia pages in Google search results are redirects (for example,
Google for "nonogram" and look at the seventh search result). I was
wondering if there is a <link> element one could use to say that another
URL is the "real" page? Then the page returned for a redirect's URL
would tell search engines the URL of the page it's redirecting to.
I'm not aware of any <link> syntax, but one way to do it would be for
MediaWiki to issue an HTTP 301 status (permanent redirect) to the new
page, rather than returning 200 and giving the content. That probably
introduces an unacceptably large performance penalty, though (extra
round trip per request).
It's not a performance issue at all, and round-trips for 301s are often
cheap compared to rendering.
...except for the fact that you are adding a round-trip in addition to
subsequent rendering. I'll take your word for it that it's not a big
deal in the larger scheme of things, but relative to a single header or
tag, it seems pretty expensive (492 bytes inbound + 778 bytes outbound
in the test I just ran with Firefox <=> standard config Apache).
It just makes it a lot harder to deal with such pages:
if you
HTTP-redirect straight to the target page you're missing the link back
to the redirect page. (And that is *crucial* for editing work and
vandalism cleanup. It is non-negotiable.)
If you redirect to an alternate URL which includes the linkback address,
then a) it's an uglier URL and b) you don't get the alleged benefits of
going to the single target URL in the first place.
We've actually discussed this many times before; please search the list
archives if you wish to comment further. :)
I looked through the archives, and found the old "301's are evil"
discussion from July 2003, which looks more like a misunderstanding than
a productive conversation.
I'd like to point out that there's a third way, which is to set a
cookie, rather than put the original request info in the URL. I'll
admit that's probably got other problems, but I'm throwing that out
there as a solution.
The
"Content-Location" HTTP header is a potential longshot. I don't
think Google documents their use/non-use of this header, but it's one of
those "can't hurt" kind of things.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.14
The spec is sufficiently vague and mysterious that I'd recommend against
using it for any purpose.
Typical use is in content negotiation, allowing the server to advertise
the direct URL to the content that was ultimately served as a result of
the negotiation.
Since the destination page would not return
the same HTML as the redirect page, it would likely be incorrect and
might cause problems if anything does use it.
I suppose you're right. More importantly, there's little reason to
believe that it'd actually solve the problem at hand. Now that I think
about it, the search engines probably shouldn't imply that the content
location header contains the better URL to use to access the content in
question. Since they shouldn't, that means it's a bad thing to count
on.
Rob