[Wikipedia-l] Re: Robots and special pages

Daniel Mayer maveric149 at yahoo.com
Sat May 18 23:32:52 UTC 2002


On Saturday 18 May 2002 12:01 pm, you wrote:
> Message: 6
> Date: Fri, 17 May 2002 15:37:47 -0700
> From: lcrocker at nupedia.com
> To: wikipedia-l at nupedia.com
> Subject: [Wikipedia-l] Robots and special pages
> Reply-To: wikipedia-l at nupedia.com
>
> This is a multi-part message in MIME format...
>
> ------------=_1021675067-23053-0
> Content-Type: text/plain
> Content-Disposition: inline
> Content-Transfer-Encoding: binary
>
> A discussion just came up on the tech list that deserves input from
> the list at large: how do we want to restrict access (if at all) to
> robots on wikipedia special pages and edit pages and such?

My two cents (well maybe a bit more),

On talk pages: OPEN to bots

Its A OK for bots to index talk pages -- these pages often have interesting 
discussion that should be on search engines. Of course, if this becomes a 
performance issue then we could prevent bots from indexing these. 

On wikipedia pages: OPEN to bots

I STRONGLY feel that wikipedia pages should be open to bots -- remember we 
are also trying to expand our community here and people do search for those 
things on the net.  

On user pages: OPEN to bots

I also don't see anything wrong with letting bots crawl all over user pages 
-- I occasionally browse personal home pages of other people that have 
similar interests to myself. This project isn't just about the articles it is 
also about community building. 

On log, history, print and special pages: CLOSED to bots (closed at least for 
indexing -- not sure about allowing the 'follow links' function. Would 
closing this allow bots to do their thing faster or slower? Is this at all 
important for us to consider? If a bot can index our site fast, will it do it 
more often?)  

I think that the wikipedia pages are FAR better at naturally explaining what 
the project is about than are the log, history and special pages are -- these 
pages are far too technical and change too quickly to be useful for any 
search performed on a search engine. There is also limited utility of having 
direct links to the Printable version of articles -- these don't have any 
active wiki links in them which obscures the fact that the page is from a 
wiki. 

Having history pages in the search results of external search engines is 
potentially dangerous, since somebody could easily click into an older 
version and save it -- thus reverting the article and unwittingly "earning" 
the label of VANDAL (even if they did make a stab at improving the version 
they read). Another reason to disallow bots access to history is because 
there often is copyrighted material in the history of pages that has since 
been removed from the current article version (it would be nice for an admin 
to able to delete just an older version of an article BTW).      

On Edit links: CLOSED to bots (for index and probably follow links)

The edit links REALLY should NOT be allowed to be indexed by any bot: When 
somebody searches for something on a search engine, gets a link to our site, 
and clicks on it; do we want them to be greeted with an edit window? They 
want information -- not an edit window. No wonder we have so many pages that 
only have "Describe the new page here" as their only content. 

I've been tracking this for awhile and almost every one of these pages that 
are created, are created by an IP that never returns to edit again. Many (if 
not most) of these "mysteriously" created pages are probably from someone 
clicking from a search engine, becoming puzzled by the edit window, and 
hitting the save button in frustration. Heck, I think I may have created a 
few of these in my pre-wiki days. 

This has become a bit of a maintenance issue for the admins -- we can't 
delete these pages fast enough, let alone create stubs for them. If left 
unchecked, this could reduce the average quality of wikipedia articles and 
give people doubt as to whether an "active" wiki link really has an article 
(or even a stub) behind it. 

There could, of course, be a purely technical fix for this by having the 
software not recognize newly created blank or "Describe the new page here" 
pages as being real pages (a Good Idea BTW). But then we still would have 
frustrated people who were looking for actual info that in the future may 
avoid clicking through to our site because of a previous "edit window 
experience".  

Conclusion:

We should try to put our best foot forward when allowing bots to index the 
site and only allow indexing of pages that have information which is 
potentially useful for the person searching. 

Edit widows and outdated lists are NOT useful to somebody clicking through 
for the first time (Recent Changes might be the only exception: Even though 
any index of this will be outdated, it is centrally important to the project 
and fairly self-explanatory). Links to older versions of articles and to 
history pages also sets-up would-be contributors into becoming labeled as 
"vandals" when trying to edit an older version -- thus turning them away 
forever. 

Let visitors explore a real article first and discover the difference between 
an edit window an an actual article -- then they can decide about becoming a 
contributor, visitor or even a developer for that matter.        

maveric149



More information about the Wikipedia-l mailing list