Re: [Wikitech-l] What is required to "fix search"?

19 Apr 2006

On April 14, Evan Martin wrote:

...
  To answer your specific proposal:
 1) http://en.wikipedia.org/wiki/Special:Recentchanges has a meta tag:
 <meta name="robots" content="noindex,follow" />
 which indicates it's explicitly disallowed from being crawled. 
As far as I understand the robots meta tag, "noindex,follow" tells 
robots that they are welcome to fetch the page, that they can find 
links to other pages here (= follow), but they should never show 
this page among the search hits (= noindex).

Words such as crawl and index are somewhat fuzzy here.  Does 
"index" mean fetch or does it mean store in an index, to be 
returned to users as a search hit?  I found no clear answer. Of 
course, the crawler/robot/spider is already fetching the page when 
it sees the meta tag.  And it must fetch the page again to see if 
the meta tag has changed.

The Pipermail software that is used for the wikitech-l archive 
sets "noindex,follow" for the overview sorted by date, e.g. 
http://mail.wikimedia.org/pipermail/wikitech-l/2006-April/date.html 
but for the individual posting, it sets "index,nofollow", e.g. 
http://mail.wikimedia.org/pipermail/wikitech-l/2006-April/034969.html

I believe that "noindex,follow" is used for many "sitemap" pages, 
and this is my idea of how search robots should use RecentChanges.

Indeed, the front page of any newspaper website is also similar to 
a sitemap.  Its content changes so often that it becomes useless 
to index it under any specific word found there.  If people search 
for "hurricane katrina", they don't want the front page of the 
Washington Post, which will have changed by the time they arrive.  
But they might be interested in the news article about this topic, 
and the front page was the way to harvest the link to that 
article.

The main difference, then, between the newspaper and Wikipedia is 
that the newspaper uses their RecentChanges as their front page.  
Plus the fact that Wikipedia isn't covered by Google News.

-- 
  Lars Aronsson (lars(a)aronsson.se)
  Aronsson Datateknik - http://aronsson.se

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] What is required to "fix search"?