2008/1/28, Anders Wegge Jakobsen <wegge(a)wegge.dk>dk>:
Consider a random article about the 1190s, like
[[da:1190'erne]]. Follow the interwiki links from that one. Follow te
interwiki links from the newly found articles. Wonder how you ended up
back af dawiki, but this time at the year 1190.
There is something rotten, not just in the state of denmark, but in a
lot of the wikipedia articles about years and decades. I've made a
simplified interwiki graph, <http://wegge.dk/interwiki-1190.png>. Be
warned that it's a 13522 x 309 png image. The graph clearly shows that
a lot of wikipedias have a path from their respective decade article,
via [[he:1190]], back to the same wikipedias year article. This is not
the only decade showing the problem, so it's not easy to fix. Given
the the large number of wikipedias involved, it's too large a task to
perform by hand, and I'm also afraid that before one such loop has
been fixed, one or more iw bots will start spreading the problem
again.
So does anyone have an idea about how to solve this mess by bot?
Just do
python interwiki.py 1190'erne -ignore:he:1190 -force
with a bot that is registered at all languages that have a page on the
decade (or, if it is not, do the remaining ones by hand). It will
remove the incorrect link, and get the thing working correctly again.
As for your fear that "before one such loop has been fixed, one or
more iw bots will start spreading the problem again." - this will not
happen unless their operators are doing a really bad job. A bot
working on the 1190s will find that there are languages for which it
gets two links - one for 1190s and one for 1190. In such a case, it
will not out of itself make a decision or in fact make any changes at
all. Instead,
* if the bot is running autonomously, it will skip the page
* if the bot is running interactively, it will ask the operator which
pages to include and which ones not
The only bot who will re-create the mess is an interactive bot in
which the operator makes the wrong choice as to what to include. Bots
do have a risk of copying mistakes, but once any loop to a different
page in the same language has been found, the bots will stop and just
ignore the pages involved.
--
Andre Engels, andreengels(a)gmail.com
ICQ: 6260644 -- Skype: a_engels