On 3 August 2015 at 11:04, Magnus Manske <magnusmanske@googlemail.com> wrote:

After some tweaking, I ended up with 6570 names (some of them double, see above).

Of these, 3481 matched the ODNB names in mix'n'match.

Of these, 24 are not women on Wikidata. I mage a PagePile for those:
https://tools.wmflabs.org/pagepile/api.php?id=251&action=get_data&format=html


It appears that at least some of the women on your list are not women. Example from your list:

# {{User:Rich Farmbrough/ODNB entry|image=1|known for=army medical officer and transvestite|born=c.1799|died=1865|forenames=James |surname=Barry}}

"transvestite" does, AFAIK, not qualify as "woman"; even if so, it would be more a transgender case?

Quite a well-known case:

https://en.wikipedia.org/wiki/James_Barry_(surgeon)

of a woman living as a man. 

Extrapolating from this, there would be less than 50 entries in your list (if one could extract the proper ODNB names) that are not marked as women in Wikidata, most of them correctly so.

On the other hand, there are 6241 Wikidata items with an ODNB ID that are marked as women [1]. Compared with your 6570, that would mean at least 329 women on Wikidata are not marked as such, or do not exist at all.

All of the ODNB items on Wikidata that are marked as human have a gender assigned, so unless something/someone went very wrong, missing gender assignment is not the issue. There are also no ODNB items that do not have an "instance of".

Which leaves these explanations:
* There are ~330 women in your list that are neither in Mix'n'match, nor in Wikidata
* There are ~330 bogus women in your list
* Some combination of the above

The first bullet is probably close enough to the truth.  

As women make up ~10% of the ODNB, 330 missing women in Wikidata would mean we are missing a set of at least 3000 ODNB entries somewhere.


The inference needs tweaking, though. 

Work on the women in the first edition and first two supplements should have ensured that no missing women are from the older half (in which women represent a lower proportion of entries). Allowing for that, we get more like the current estimate of say 1500 to 2000 ODNB entries missing from the first pass. (There is a plan for doing more, BTW, which I have discussed with Andrew Gray.)

Charles