That's a pretty good solution, although one of the
issues is that the
title
includes the namespace, which needs to be removed to get the actual page
title. I feel that the <page> section should be complete in and of itself,
without requiring the header section mapping namespace names to ids.
Without
knowing the mappings (ns to ns-title) that are present in the header, you
cannot interpret the title unambiguosly, for example <title ns="0">Star
Trek: The Next Generation</title> relies on the parser knowing that ns-0
is
not called 'Star Trek' in order to be interpreted properly.
How about <title ns="12"
ns-title="Help">Contents</title>?
- Mark Clements (HappyDog)
I think you could assume that any non-zero namespace has prefix so you'd
only need to split on the first ':' if it has a namespace number != 0 (this
assumes we will never setup a namespace with ':' in it).
BTW: why are you having so much trouble with this?