Aryeh Gregor schreef:
On Wed, Mar 18, 2009 at 6:18 AM, Petr Kadlec
<petr.kadlec(a)gmail.com> wrote:
page_title does not contains the full title, only
its
namespace-relative part. You need to use
select page_namespace, page_title from wikidb.page
Only this whole tuple (page_namespace, page_title) is a unique
identifier of a page (this is true for the whole MediaWiki).
And note that the namespace is stored as a number. You'll need to
refer to a list of the namespace numbers on the specific wiki you're
dealing with to translate it into the appropriate prefix. There's a
way to get the list from the API, but I don't know it offhand.
http://en.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop…
Note that namespaces with an ID of 100 or higher are specific to enwiki
and may have different names or not be used at all on other wikis. To
get an accurate list for another wiki, ask that wiki's api.php .
As for redirects: yes, you'll want to do something like:
SELECT page_namespace, page_title, rd_namespace, rd_title
FROM page LEFT JOIN redirect ON rd_from=page_id;
This'll list all page titles and their redirect targets, with
rd_namespace and rd_title set to NULL for pages that aren't redirects.
Note that the redirect table doesn't handle section redirects (like
redirects to [[Foo#Bar]], which are stored as redirects to [[Foo]]) and
interwiki redirects (like redirects to [[wikt:dog]], which are stored as
redirects to [[dog]]) too well and that some redirects may be missing
from it entirely (IIRC about half a million redirects are missing from
enwiki's redirect table). Even worse, the data dump you downloaded might
not even contain the redirect table. You can rebuild the redirect table
with:
php maintenance/refreshLinks.php --redirects-only
(Use --old-redirects-only to only add missing entries rather than
checking existing entries for validity as well.)
Roan Kattouw (Catrope)