On 19 January 2011 14:29, Rajarshi Guha <rajarshi.guha(a)gmail.com> wrote:
Hi, I was trying to extract some information from the
protein target
infobox on protein target pages (eg
http://en.wikipedia.org/wiki/Calreticulin or
http://en.wikipedia.org/wiki/Hsp90).
However when I export the page via
http://en.wikipedia.org/w/api.php?action=query&pageids=7120&export=…
the XML page does not seem to contain the information that I can see
when viewing the page in the browser. For example, the XML export for
Calreticulin does not contain the links to the rendering of the
structure or the PDB identifiers and so on.
Is my export URL wrong? Or is there a reason that the infobox
information is not exported and if so, is there a way to access it via
export?
The XML output is mainly the "plain" wikitext code of the page, rather
than the rendered text version. As a result, you don't get the
rendered version of the infobox, you just get the snippet of code
calling it:
{{PBB|geneid=811}}
This template is surprisingly simple - it takes the "geneid" number
and directs to a pre-generated specific subpage, in this case
http://en.wikipedia.org/wiki/Template:PBB/811
The gallery box at the bottom works in the same way:
{{PDB Gallery|geneid=811}}
directs you to
http://en.wikipedia.org/wiki/Template:PDB_Gallery/811
I am not immediately sure why these are seperate rather than
integrally part of the article, which is normal for infoboxes -
perhaps because it dissuades well-meaning but erroneous passing
alterations to the data, or because it simplifies maintenance. As
you've noticed, while it's transparent to the user, it's a little
confusing to working with!
It should be possible for you to pick the geneid number out of your
export and then run an additional export on Template:PBB/$number and
Template:PBB_Gallery/$number. Would that be sufficient?
--
- Andrew Gray
andrew.gray(a)dunelm.org.uk