Did anybody implement in MediaWiki a link rot checker for external
links? Here is a theory for how it could work:
When a new page is saved and the parser detects a "http:" pattern,
the external link is stored in a separate database table, with a
pointer to the wiki page where it was harvested. At regular
intervals, all external links are tried (HTTP GET) by a background
process and the success or failure rate is recorded. If a link
becomes unavailabe (HTTP ERROR) during three consequtive fetch
attempts, it gets listed on a special page of possibly broken
external links. Broken links from the same website could be
grouped together. Maybe the whole site is broken, has moved or
has been internally reorganized.
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
I have added an option to Special:Export that will add a list of all
contributors of a page to the XML output. The list is distinct (each
contributor mentioned only once). I also expanded the XML converter to
use this. (Brion et al.: please put the new Special:Export on the live
sites soon)
For my local test site, where the new Special:Export is running, it will
now add a list of all contributors to the output (IPs are omitted). It
will even add people who worked on a template that is used in the
document ;-)
Next stop: OpenOffice .ODT output...
Magnus
An automated run of parserTests.php showed the following failures:
Running test BUG 1887, part 2: A <math> with a thumbnail- math enabled... FAILED!
Passed 299 of 300 tests (99.67%) FAILED!
At svwiki, there is a user account that has made no edits. However, the user
account sends wikimail - nasty ones. I have so far recieved three of them,
and this is in no way unusual. IP-check is being performed to see if the
account is a sockpuppet of another registered user.
Is it possible to block the users ability to send wikimail? A block doesn't
do that, blocked users can still send wikimail.
/habj
An automated run of parserTests.php showed the following failures:
Running test BUG 1887, part 2: A <math> with a thumbnail- math enabled... FAILED!
Passed 299 of 300 tests (99.67%) FAILED!
[repost, sorry if it ends duplicated]
Hi
it seems to me that there are some inconsistencies between at least the
page and revision tables, in the 20060303 enwiki dump.
The first problematic page would be page_id 12, Anarchism (sorry for the
raw mysql formatting):
| page_id | page_namespace | page_title | page_restrictions |
page_counter | page_is_redirect | page_is_new | page_random |
page_touched | page_latest | page_len |
+---------+----------------+------------+-------------------+-----------
---+------------------+-------------+-------------------+---------------
-+-------------+----------+
| 12 | 0 | Anarchism | |
5252 | 0 | 0 | 0.786172332974311 |
20060303031540 | 41982999 | 67537 |
which indicates a revision # 41982999.
But there is no line with rev_id=41982999 in the revision table.
(these can be verified grepping for 41982999 directly in
enwiki-20060303-pages-articles.xml.bz2 and in
enwiki-20060303-page.sql.gz)
Now:
- am I missing something here ?
- it might be that the revision has changed between the dumps of those 2
tables (page has been edited)
- it ends in empty pages (i.e. with the usual stub text), for ~ 5% of
the pages (that seems huge, but I don't see where the problem lies)
- is it a temporary problem (I don't recall getting so many empty
articles with earlier dumps) ?
- is there a simple way to fix it ? (if no better idea emerges, I will
try to fix the page_latest column in the page table by doing a lookup on
rev_page in the revision table - is it right ?)
Thanks
--
Colin