Hey Andy
On Feb 9, 2015, at 4:24 PM, Andy Mabbett
<andy(a)pigsonthewing.org.uk> wrote:
On 9 February 2015 at 22:59, Aaron Halfaker <ahalfaker(a)wikimedia.org> wrote:
Our spot checking suggests that 98% of these DOIs
resolve. The remaining 2%
were extracted correctly, but they appear to be typos.
All on en.Wikipedia?
correct, we haven’t looked at other projects for this release
Do DoIs not incude check digits?
they don’t, validation can be done via the CrossRef API or the DOI resolver. This method
is not 100% reliable, especially when DOIs include special characters. CrossRef advised to
use a 200 HTTP response code from the resolver with a noredirect flag (e.g.
http://dx.doi.org/{doi}?noredirect=true) as an indication that the DOI is valid and
resolves.
We should test for tehse in citation templates. Does
your data show which templates (if any) the broken
DoIs were in?
we haven’t checked if these errors occur systematically within specific templates, but we
know that the code extracted them correctly with no parsing errors. We’ll share the list
of broken DOIs so they can be reviewed and fixed.
Dario