Mark Clements wrote:
If the following statements are all true, then there
is no ambiguity. If
any of them are false then that ambiguity remains:
1) The main namespace can never have a prefix (ns title is always empty)
Yes.
2) All other namespaces _must_ have a prefix
Yes.
3) A namespace title cannot contain a colon
No.
If they are all true then your assumption above is
valid, otherwise it is
not and you will parse some edge-case titles incorrectly.
No.
Generally, parsing full-text titles to (namespace, title) requires knowing two
things:
a) The set of all defined namespace prefixes
b) The set of all defined interwiki prefixes
For parsing page titles from a dump, you only care about the namespaces -- which
are provided in the dump -- since no pages there can have an interwiki title.
And of course, you only care about the namespaces *if you do* care about the
namespaces, which you very often may not.
-- brion vibber (brion @
pobox.com)