On 3 August 2015 at 11:04, Magnus Manske <magnusmanske(a)googlemail.com>
wrote:
After some tweaking, I ended up with 6570 names (some of them double, see
above).
Of these, 3481 matched the ODNB names in mix'n'match.
Of these, 24 are not women on Wikidata. I mage a PagePile for those:
https://tools.wmflabs.org/pagepile/api.php?id=251&action=get_data&f…
It appears that at least some of the women on your list are not women.
Example from your list:
# {{User:Rich Farmbrough/ODNB entry|image=1|known for=army medical officer
and transvestite|born=c.1799|died=1865|forenames=James |surname=Barry}}
"transvestite" does, AFAIK, not qualify as "woman"; even if so, it
would
be more a transgender case?
Quite a well-known case:
https://en.wikipedia.org/wiki/James_Barry_(surgeon)
of a woman living as a man.
Extrapolating from this, there would be less than 50 entries in your list
(if one could extract the proper ODNB names) that are not marked as women
in Wikidata, most of them correctly so.
On the other hand, there are 6241 Wikidata items with an ODNB ID that are
marked as women [1]. Compared with your 6570, that would mean at least 329
women on Wikidata are not marked as such, or do not exist at all.
All of the ODNB items on Wikidata that are marked as human have a gender
assigned, so unless something/someone went very wrong, missing gender
assignment is not the issue. There are also no ODNB items that do not have
an "instance of".
Which leaves these explanations:
* There are ~330 women in your list that are neither in Mix'n'match, nor
in Wikidata
* There are ~330 bogus women in your list
* Some combination of the above
The first bullet is probably close enough to the truth.
As women make up ~10% of the ODNB, 330 missing women in Wikidata would
mean we are missing a set of at least 3000 ODNB entries somewhere.
The inference needs tweaking, though.
Work on the women in the first edition and first two supplements should
have ensured that no missing women are from the older half (in which women
represent a lower proportion of entries). Allowing for that, we get more
like the current estimate of say 1500 to 2000 ODNB entries missing from the
first pass. (There is a plan for doing more, BTW, which I have discussed
with Andrew Gray.)
Charles