On Fri, 15 Aug 2003, Erik Zachte wrote:
I am not familiar with the PHP code, so this may be
already done:
In the English Wikipedia 9.5% of links (362K out of 3800K total) consist
of references to years between 0 and 2000.
Therefore two lines of code might save quite a bit of link checking.
I really doubt this kind of special casing would save much. Since these
pages all exist, we've already loaded the fact of their existence by
grabbing the current page's outgoing links from the 'links' table and
it's
just a lookup in an associative array.
"|Race (US Census)|Square mile|Census|Population
density|United States
Census Bureau|Asia|Geographic references|Native American|African
American|Hispanic|Latino|United States|"
would again save > 10% of checks (390K)
(assuming all articles are viewed equally often, which of course is not
true)
I'd wager that most of the rambot pages are viewed primarily by googlebot,
and it gets precached pages. ;)
-- brion vibber vibber (brion @
pobox.com)