Xqt submitted this change.

View Change

Approvals: jenkins-bot: Verified Xqt: Looks good to me, approved
[pep8] PEP8 changes for create_isbn_edition.py

- Code style issues
- clear trailing white space
- untabify file
- keep lines below 80 chars
- update function documentation and parameter list
- update shebang
- script documentation is in __doc__
- replace print statements by pywikibot.info
- add main() function, mostly needed for windows and for script tests
- add pywikibot.handle_args to handle global options and test -help
- add isbnlib dependency
- lazy import isbnlib and unidecode
- replace sys.stdin.read by pywikibot.input to show a input message
- create wikidata_site and repo after global args are read to prevent
site warning

Change-Id: I6917ec9b511db609c2f1828486b9a53998d1e376
---
M scripts/create_isbn_edition.py
M setup.py
M tests/script_tests.py
M tox.ini
4 files changed, 317 insertions(+), 212 deletions(-)

diff --git a/scripts/create_isbn_edition.py b/scripts/create_isbn_edition.py
index ee6b18c..2aea6c1 100644
--- a/scripts/create_isbn_edition.py
+++ b/scripts/create_isbn_edition.py
@@ -1,15 +1,15 @@
-#!/home/geertivp/pwb/bin/python3
-
-codedoc = """
-Pywikibot client to load ISBN related data into Wikidata
+#!/usr/bin/python3
+"""Pywikibot script to load ISBN related data into Wikidata.

Pywikibot script to get ISBN data from a digital library,
and create or amend the related Wikidata item for edition
(with the P212=ISBN number as unique external ID).

-Use digital libraries to get ISBN data in JSON format, and integrate the results into Wikidata.
+Use digital libraries to get ISBN data in JSON format, and integrate the
+results into Wikidata.

-Then the resulting item number can be used e.g. to generate Wikipedia references using template Cite_Q.
+Then the resulting item number can be used e.g. to generate Wikipedia
+references using template Cite_Q.

Parameters:

@@ -34,39 +34,49 @@
Default LANG; e.g. en, nl, fr, de, es, it, etc.

P3 P4...: P/Q pairs to add additional claims (repeated)
- e.g. P921 Q107643461 (main subject: database management linked to P2163 Fast ID)
+ e.g. P921 Q107643461 (main subject: database
+ management linked to P2163 Fast ID)

stdin: ISBN numbers (International standard book number)

- Free text (e.g. Wikipedia references list, or publication list) is accepted.
- Identification is done via an ISBN regex expression.
+ Free text (e.g. Wikipedia references list, or publication list)
+ is accepted. Identification is done via an ISBN regex expression.

Functionality:

- * The ISBN number is used as a primary key (P212 where no duplicates are allowed)
- The item update is not performed when there is no unique match
- * Statements are added or merged incrementally; existing data is not overwritten.
- * Authors and publishers are searched to get their item number (ambiguous items are skipped)
+ * The ISBN number is used as a primary key (P212 where no duplicates
+ are allowed. The item update is not performed when there is no
+ unique match
+ * Statements are added or merged incrementally; existing data is not
+ overwritten.
+ * Authors and publishers are searched to get their item number
+ (ambiguous items are skipped)
* Book title and subtitle are separated with '.', ':', or '-'
* This script can be run incrementally with the same parameters
- Caveat: Take into account the Wikidata Query database replication delay.
- Wait for minimum 5 minutes to avoid creating duplicate objects.
+ Caveat: Take into account the Wikidata Query database
+ replication delay. Wait for minimum 5 minutes to avoid creating
+ duplicate objects.

Data quality:

- * Use https://query.wikidata.org/querybuilder/ to identify P212 duplicates
- Merge duplicate items before running the script again.
+ * Use https://query.wikidata.org/querybuilder/ to identify P212
+ duplicates. Merge duplicate items before running the script
+ again.
* The following properties should only be used for written works
P5331: OCLC work ID (editions should only have P243)
- P8383: Goodreads-identificatiecode for work (editions should only have P2969)
+ P8383: Goodreads-identificatiecode for work (editions should
+ only have P2969)

Examples:

- # Default library (Google Books), language (LANG), no additional statements
+ # Default library (Google Books), language (LANG), no additional
+ statements
+
./create_isbn_edition.py
9789042925564

# Wikimedia, language Dutch, main subject: database management
+
./create_isbn_edition.py wiki en P921 Q107643461
978-0-596-10089-6

@@ -109,10 +119,11 @@
P1036: Dewey Decimal Classification
P2163: Fast ID (inverse lookup via Wikidata Query) -> P921: main subject
P2969: Goodreads-identificatiecode
-
+
(only for written works)
P5331: OCLC work ID (editions should only have P243)
- P8383: Goodreads-identificatiecode for work (editions should only have P2969)
+ P8383: Goodreads-identificatiecode for work (editions should only
+ have P2969)

Author:

@@ -154,7 +165,7 @@
https://pypi.org/search/?q=isbnlib_

pip install isbnlib (mandatory)
-
+
(optional)
pip install isbnlib-bol
pip install isbnlib-bnf
@@ -169,24 +180,32 @@
* Better use the ISO 639-1 language code parameter as a default
The language code is not always available from the digital library.
* SPARQL queries run on a replicated database
- Possible important replication delay; wait 5 minutes before retry -- otherwise risk for creating duplicates.
+ Possible important replication delay; wait 5 minutes before retry
+ -- otherwise risk for creating duplicates.

Known problems:

* Unknown ISBN, e.g. 9789400012820
- * No ISBN data available for an edition either causes no output (goob = Google Books), or an error message (wiki, openl)
+ * No ISBN data available for an edition either causes no output
+ (goob = Google Books), or an error message (wiki, openl)
The script is taking care of both
* Only 6 ISBN attributes are listed by the webservice(s)
missing are e.g.: place of publication, number of pages
- * Not all ISBN atttributes have data (authos, publisher, date of publication, language)
- * The script uses multiple webservice calls (script might take time, but it is automated)
- * Need to amend ISBN items that have no author, publisher, or other required data (which additional services to use?)
+ * Not all ISBN atttributes have data (authos, publisher, date of
+ publication, language)
+ * The script uses multiple webservice calls (script might take time,
+ but it is automated)
+ * Need to amend ISBN items that have no author, publisher, or other
+ required data (which additional services to use?)
* How to add still more digital libraries?
- * Does the KBR has a public ISBN service (Koninklijke Bibliotheek van België)?
- * Filter for work properties -- need to amend Q47461344 (written work) instance and P629 (edition of) + P747 (has edition) statements
- https://www.wikidata.org/wiki/Q63413107
+ * Does the KBR has a public ISBN service (Koninklijke
+ Bibliotheek van België)?
+ * Filter for work properties -- need to amend Q47461344 (written
+ work) instance and P629 (edition of) + P747 (has edition)
+ statements https://www.wikidata.org/wiki/Q63413107
['9781282557246', '9786612557248', '9781847196057', '9781847196040']
- P8383: Goodreads-identificatiecode voor work 13957943 (should have P2969)
+ P8383: Goodreads-identificatiecode voor work 13957943 (should
+ have P2969)
P5331: OCLC-identificatiecode voor work 793965595 (should have P243)

To do:
@@ -205,7 +224,7 @@
Environment:

The python script can run on the following platforms:
-
+
Linux client
Google Chromebook (Linux container)
Toolforge Portal
@@ -238,7 +257,7 @@
Related projects:

https://phabricator.wikimedia.org/T314942 (this script)
-
+
(other projects)
https://phabricator.wikimedia.org/T282719
https://phabricator.wikimedia.org/T214802
@@ -254,64 +273,71 @@
https://en.wikipedia.org/wiki/bibliographic_database
https://www.titelbank.nl/pls/ttb/f?p=103:4012:::NO::P4012_TTEL_ID:3496019&cs=19BB8084860E3314502A1F777F875FE61

+.. versionadded:: 7.7
"""
-
+#
+# (C) Pywikibot team, 2022
+#
+# Distributed under the terms of the MIT license.
+#
import logging # Error logging
import os # Operating system
-import re # Regular expressions (very handy!)
+import re # Regular expressions (very handy!)
import sys # System calls
-import unidecode # Unicode

-import pywikibot # API interface to Wikidata
-
-from isbnlib import * # ISBN data
-from pywikibot import pagegenerators as pg # Wikidata Query interface
+import pywikibot # API interface to Wikidata
+from pywikibot import pagegenerators as pg # Wikidata Query interface
+from pywikibot.backports import List
from pywikibot.data import api

+try:
+ import isbnlib
+except ImportError as e:
+ isbnlib = e
+
+try:
+ from unidecode import unidecode
+except ImportError as e:
+ unidecode = e
+
# Initialisation
debug = True # Show debugging information
verbose = True # Verbose mode

booklib = 'goob' # Default digital library
-isbnre = re.compile(r'[0-9-]{10,17}') # ISBN number: 10 or 13 digits with optional dashes (-)
+
+# ISBN number: 10 or 13 digits with optional dashes (-)
+isbnre = re.compile(r'[0-9-]{10,17}')
propre = re.compile(r'P[0-9]+') # Wikidata P-number
qsuffre = re.compile(r'Q[0-9]+') # Wikidata Q-number

# Other statements are added via command line parameters
target = {
-'P31':'Q3331189', # Is an instance of an edition
+ 'P31': 'Q3331189', # Is an instance of an edition
}

# Statement property and instance validation rules
propreqinst = {
-'P50':'Q5', # Author requires human
-'P123':{'Q2085381', 'Q1114515', 'Q1320047'},# Publisher requires publisher
-'P407':{'Q34770', 'Q33742', 'Q1288568'}, # Edition language requires at least one of (living, natural) language
+ 'P50': 'Q5', # Author requires human
+ # Publisher requires publisher
+ 'P123': {'Q2085381', 'Q1114515', 'Q1320047'},
+ # Edition language requires at least one of (living, natural) language
+ 'P407': {'Q34770', 'Q33742', 'Q1288568'},
}

mainlang = os.getenv('LANG', 'en')[:2] # Default description language

# Connect to database
-transcmt = '#pwb Create ISBN edition' # Wikidata transaction comment
-wikidata_site = pywikibot.Site('wikidata', 'wikidata') # Login to Wikibase instance
-repo = wikidata_site.data_repository() # Required for wikidata object access (item, property, statement)
+transcmt = '#pwb Create ISBN edition' # Wikidata transaction comment


-def is_in_list(statement_list, checklist):
+def is_in_list(statement_list, checklist: List[str]) -> bool:
+ """Verify if statement list contains at least one item from the checklist.
+
+ :param statement_list: Statement list
+ :param checklist: List of values
+ :Returns: True when match
"""
-Verify if statement list contains at least one item from the checklist
-
-Parameters:
-
- statement_list: Statement list
-
- checklist: List of values (string)
-
-Returns:
-
- Boolean (True when match)
- """
-
for seq in statement_list:
if seq.getTarget().getID() in checklist:
isinlist = True
@@ -322,84 +348,92 @@


def get_item_list(item_name, instance_id):
+ """Get list of items by name, belonging to an instance (list).
+
+ :param item_name: Item name (string; case sensitive)
+ :param instance_id: Instance ID (string, set, or list)
+ :Returns: Set of items (Q-numbers)
"""
-Get list of items by name, belonging to an instance (list)
-
-Parameters:
-
- item_name: Item name (string; case sensitive)
-
- instance_id: Instance ID (string, set, or list)
-
-Returns:
-
- Set of items (Q-numbers)
- """
-
item_list = set() # Empty set
- params = {'action': 'wbsearchentities', 'format': 'json', 'type': 'item', 'strictlanguage': False,
- 'language': mainlang, # All languages are searched, but labels are in native language
- 'search': item_name} # Get item list from label
+ params = {
+ 'action': 'wbsearchentities',
+ 'format': 'json',
+ 'type': 'item',
+ 'strictlanguage': False,
+ # All languages are searched, but labels are in native language
+ 'language': mainlang,
+ 'search': item_name, # Get item list from label
+ }
request = api.Request(site=wikidata_site, parameters=params)
result = request.submit()

if 'search' in result:
for res in result['search']:
item = pywikibot.ItemPage(repo, res['id'])
- item.get(get_redirect = True)
+ item.get(get_redirect=True)
if 'P31' in item.claims:
- for seq in item.claims['P31']: # Loop through instances
- if seq.getTarget().getID() in instance_id: # Matching instance
- for lang in item.labels: # Search all languages
- if unidecode.unidecode(item_name.lower()) == unidecode.unidecode(item.labels[lang].lower()): # Ignore label case and accents
- item_list.add(item.getID()) # Label math
+ for seq in item.claims['P31']: # Loop through instances
+ # Matching instance
+ if seq.getTarget().getID() in instance_id:
+ for lang in item.labels: # Search all languages
+ # Ignore label case and accents
+ if (unidecode(item_name.lower())
+ == unidecode(item.labels[lang].lower())):
+ item_list.add(item.getID()) # Label math
for lang in item.aliases:
- if item_name in item.aliases[lang]: # Case sensitive for aliases
- item_list.add(item.getID()) # Alias match
+ # Case sensitive for aliases
+ if item_name in item.aliases[lang]:
+ item_list.add(item.getID()) # Alias match
return item_list


-def amend_isbn_edition(isbn_number):
- """
-Amend ISBN registration.
-
-Parameters:
-
- isbn_number: ISBN number (string; 10 or 13 digits with optional hyphens)
-
-Result:
+def amend_isbn_edition(isbn_number): # noqa: C901
+ """Amend ISBN registration.

Amend Wikidata, by registering the ISBN-13 data via P212,
depending on the data obtained from the digital library.
+
+ :param isbn_number: ISBN number (string; 10 or 13 digits with
+ optional hyphens)
"""
+ global logger
global proptyx
+ global targetx

isbn_number = isbn_number.strip()
if isbn_number == '':
- return 3 # Do nothing when the ISBN number is missing
-
+ return 3 # Do nothing when the ISBN number is missing
+
# Validate ISBN data
if verbose:
- print()
+ pywikibot.info()

try:
- isbn_data = meta(isbn_number, service=booklib)
+ isbn_data = isbnlib.meta(isbn_number, service=booklib)
logger.info(isbn_data)
- # {'ISBN-13': '9789042925564', 'Title': 'De Leuvense Vaart - Van De Vaartkom Tot Wijgmaal. Aspecten Uit De Industriele Geschiedenis Van Leuven', 'Authors': ['A. Cresens'], 'Publisher': 'Peeters Pub & Booksellers', 'Year': '2012', 'Language': 'nl'}
+ # {'ISBN-13': '9789042925564',
+ # 'Title': 'De Leuvense Vaart - Van De Vaartkom Tot Wijgmaal. '
+ # 'Aspecten Uit De Industriele Geschiedenis Van Leuven',
+ # 'Authors': ['A. Cresens'],
+ # 'Publisher': 'Peeters Pub & Booksellers',
+ # 'Year': '2012',
+ # 'Language': 'nl'}
except Exception as error:
# When the book is unknown the function returns
logger.error(error)
- #raise ValueError(error)
+ # raise ValueError(error)
return 3

if len(isbn_data) < 6:
- logger.error('Unknown or incomplete digital library registration for %s' % isbn_number)
+ logger.error(
+ 'Unknown or incomplete digital library registration for %s'
+ % isbn_number)
return 3

# Show the raw results
if verbose:
for i in isbn_data:
- print('%s:\t%s' % (i, isbn_data[i]))
+ pywikibot.info('%s:\t%s' % (i, isbn_data[i]))

# Get the book language from the ISBN book reference
booklang = mainlang # Default language
@@ -419,10 +453,10 @@

# Get formatted ISBN number
isbn_number = isbn_data['ISBN-13'] # Numeric format
- isbn_fmtd = mask(isbn_number) # Canonical format
+ isbn_fmtd = isbnlib.mask(isbn_number) # Canonical format
if verbose:
- print()
- print(isbn_fmtd) # First one
+ pywikibot.info()
+ pywikibot.info(isbn_fmtd) # First one

# Get (sub)title when there is a dot
titles = isbn_data['Title'].split('. ') # goob is using a '.'
@@ -435,14 +469,17 @@
if len(titles) > 1:
subtitle = titles[1].strip()

- # Print book titles
+ # pywikibot.info book titles
if debug:
- print(objectname, file=sys.stderr)
- print(subtitle, file=sys.stderr) # Optional
- for i in range(2,len(titles)): # Print subsequent subtitles, when available
- print(titles[i].strip(), file=sys.stderr) # Not stored in Wikidata...
+ pywikibot.info(objectname, file=sys.stderr)
+ pywikibot.info(subtitle, file=sys.stderr) # Optional
+ # print subsequent subtitles, when available
+ for i in range(2, len(titles)):
+ # Not stored in Wikidata...
+ pywikibot.info(titles[i].strip(), file=sys.stderr)

# Search the ISBN number in Wikidata both canonical and numeric
+ # P212 should have canonical hyphenated format
isbn_query = ("""# Get ISBN number
SELECT ?item WHERE {
VALUES ?isbn_number {
@@ -451,13 +488,13 @@
}
?item wdt:P212 ?isbn_number.
}
-""" % (isbn_fmtd, isbn_number)) # P212 should have canonical hyphenated format
+""" % (isbn_fmtd, isbn_number))

logger.info(isbn_query)
generator = pg.WikidataSPARQLPageGenerator(isbn_query, site=wikidata_site)

rescnt = 0
- for item in generator: # Main loop for all DISTINCT items
+ for item in generator: # Main loop for all DISTINCT items
rescnt += 1
qnumber = item.getID()
logger.warning('Found item: %s' % qnumber)
@@ -479,7 +516,7 @@
# Add all P/Q values
# Make sure that labels are known in the native language
if debug:
- print(target, file=sys.stderr)
+ pywikibot.info(target, file=sys.stderr)

# Register statements
for propty in target:
@@ -489,8 +526,11 @@
targetx[propty] = pywikibot.ItemPage(repo, target[propty])

try:
- logger.warning('Add %s (%s): %s (%s)' % (proptyx[propty].labels[booklang], propty, targetx[propty].labels[booklang], target[propty]))
- except:
+ logger.warning('Add %s (%s): %s (%s)'
+ % (proptyx[propty].labels[booklang], propty,
+ targetx[propty].labels[booklang],
+ target[propty]))
+ except: # noqa: B001, E722, H201
logger.warning('Add %s:%s' % (propty, target[propty]))

claim = pywikibot.Claim(repo, propty)
@@ -508,20 +548,23 @@
if 'P1476' not in item.claims:
logger.warning('Add Title (P1476): %s' % (objectname))
claim = pywikibot.Claim(repo, 'P1476')
- claim.setTarget(pywikibot.WbMonolingualText(text=objectname, language=booklang))
+ claim.setTarget(pywikibot.WbMonolingualText(text=objectname,
+ language=booklang))
item.addClaim(claim, bot=True, summary=transcmt)

# Subtitle
if subtitle != '' and 'P1680' not in item.claims:
logger.warning('Add Subtitle (P1680): %s' % (subtitle))
claim = pywikibot.Claim(repo, 'P1680')
- claim.setTarget(pywikibot.WbMonolingualText(text=subtitle, language=booklang))
+ claim.setTarget(pywikibot.WbMonolingualText(text=subtitle,
+ language=booklang))
item.addClaim(claim, bot=True, summary=transcmt)

# Date of publication
pub_year = isbn_data['Year']
if pub_year != '' and 'P577' not in item.claims:
- logger.warning('Add Year of publication (P577): %s' % (isbn_data['Year']))
+ logger.warning('Add Year of publication (P577): %s'
+ % (isbn_data['Year']))
claim = pywikibot.Claim(repo, 'P577')
claim.setTarget(pywikibot.WbTime(year=int(pub_year), precision='year'))
item.addClaim(claim, bot=True, summary=transcmt)
@@ -543,7 +586,8 @@
break

if add_author:
- logger.warning('Add author %d (P50): %s (%s)' % (author_cnt, author_name, author_list[0]))
+ logger.warning('Add author %d (P50): %s (%s)'
+ % (author_cnt, author_name, author_list[0]))
claim = pywikibot.Claim(repo, 'P50')
claim.setTarget(pywikibot.ItemPage(repo, author_list[0]))
item.addClaim(claim, bot=True, summary=transcmt)
@@ -559,11 +603,13 @@
# Get the publisher
publisher_name = isbn_data['Publisher'].strip()
if publisher_name != '':
- publisher_list = list(get_item_list(publisher_name, propreqinst['P123']))
+ publisher_list = list(get_item_list(publisher_name,
+ propreqinst['P123']))

if len(publisher_list) == 1:
if 'P123' not in item.claims:
- logger.warning('Add publisher (P123): %s (%s)' % (publisher_name, publisher_list[0]))
+ logger.warning('Add publisher (P123): %s (%s)'
+ % (publisher_name, publisher_list[0]))
claim = pywikibot.Claim(repo, 'P123')
claim.setTarget(pywikibot.ItemPage(repo, publisher_list[0]))
item.addClaim(claim, bot=True, summary=transcmt)
@@ -573,30 +619,33 @@
logger.warning('Ambiguous publisher: %s' % publisher_name)

# Get addional data from the digital library
- isbn_cover = cover(isbn_number)
- isbn_editions = editions(isbn_number, service='merge')
- isbn_doi = doi(isbn_number)
- isbn_info = info(isbn_number)
+ isbn_cover = isbnlib.cover(isbn_number)
+ isbn_editions = isbnlib.editions(isbn_number, service='merge')
+ isbn_doi = isbnlib.doi(isbn_number)
+ isbn_info = isbnlib.info(isbn_number)

if verbose:
- print()
- print(isbn_info)
- print(isbn_doi)
- print(isbn_editions)
+ pywikibot.info()
+ pywikibot.info(isbn_info)
+ pywikibot.info(isbn_doi)
+ pywikibot.info(isbn_editions)

# Book cover images
for i in isbn_cover:
- print('%s:\t%s' % (i, isbn_cover[i]))
+ pywikibot.info('%s:\t%s' % (i, isbn_cover[i]))

# Handle ISBN classification
- isbn_classify = classify(isbn_number)
+ isbn_classify = isbnlib.classify(isbn_number)
if debug:
for i in isbn_classify:
- print('%s:\t%s' % (i, isbn_classify[i]), file=sys.stderr)
+ pywikibot.info('%s:\t%s' % (i, isbn_classify[i]), file=sys.stderr)

# ./create_isbn_edition.py '978-3-8376-5645-9' - de P407 Q188
# Q113460204
- # {'owi': '11103651812', 'oclc': '1260160983', 'lcc': 'TK5105.8882', 'ddc': '300', 'fast': {'1175035': 'Wikis (Computer science)', '1795979': 'Wikipedia', '1122877': 'Social sciences'}}
+ # {'owi': '11103651812', 'oclc': '1260160983', 'lcc': 'TK5105.8882',
+ # 'ddc': '300', 'fast': {'1175035': 'Wikis (Computer science)',
+ # '1795979': 'Wikipedia',
+ # '1122877': 'Social sciences'}}

# Set the OCLC ID
if 'oclc' in isbn_classify and 'P243' not in item.claims:
@@ -608,54 +657,75 @@
# OCLC ID and OCLC work ID should not be both assigned
if 'P243' in item.claims and 'P5331' in item.claims:
if 'P629' in item.claims:
- oclcwork = item.claims['P5331'][0] # OCLC Work should be unique
- oclcworkid = oclcwork.getTarget() # Get the OCLC Work ID from the edition
- work = item.claims['P629'][0].getTarget() # Edition should belong to only one single work
- logger.warning('Move OCLC Work ID %s to work %s' % (oclcworkid, work.getID())) # There doesn't exist a moveClaim method?
- if 'P5331' not in work.claims: # Keep current OCLC Work ID if present
+ oclcwork = item.claims['P5331'][0] # OCLC Work should be unique
+ # Get the OCLC Work ID from the edition
+ oclcworkid = oclcwork.getTarget()
+ # Edition should belong to only one single work
+ work = item.claims['P629'][0].getTarget()
+ # There doesn't exist a moveClaim method?
+ logger.warning('Move OCLC Work ID %s to work %s'
+ % (oclcworkid, work.getID()))
+ # Keep current OCLC Work ID if present
+ if 'P5331' not in work.claims:
claim = pywikibot.Claim(repo, 'P5331')
claim.setTarget(oclcworkid)
work.addClaim(claim, bot=True, summary=transcmt)
- item.removeClaims(oclcwork, bot=True, summary=transcmt) # OCLC Work ID does not belong to edition
+ # OCLC Work ID does not belong to edition
+ item.removeClaims(oclcwork, bot=True, summary=transcmt)
else:
- logger.error('OCLC Work ID %s conflicts with OCLC ID %s and no work available' % (item.claims['P5331'][0].getTarget(), item.claims['P243'][0].getTarget()))
+ logger.error('OCLC Work ID %s conflicts with OCLC ID %s and no '
+ 'work available'
+ % (item.claims['P5331'][0].getTarget(),
+ item.claims['P243'][0].getTarget()))

# OCLC work ID should not be registered for editions, only for works
if 'owi' not in isbn_classify:
pass
- elif 'P629' in item.claims: # Get the work related to the edition
- work = item.claims['P629'][0].getTarget() # Edition should only have one single work
- if 'P5331' not in work.claims: # Assign the OCLC work ID if missing
- logger.warning('Add OCLC work ID (P5331): %s to work %s' % (isbn_classify['owi'], work.getID()))
+ elif 'P629' in item.claims: # Get the work related to the edition
+ # Edition should only have one single work
+ work = item.claims['P629'][0].getTarget()
+ if 'P5331' not in work.claims: # Assign the OCLC work ID if missing
+ logger.warning('Add OCLC work ID (P5331): %s to work %s'
+ % (isbn_classify['owi'], work.getID()))
claim = pywikibot.Claim(repo, 'P5331')
claim.setTarget(isbn_classify['owi'])
work.addClaim(claim, bot=True, summary=transcmt)
elif 'P243' in item.claims:
- logger.warning('OCLC Work ID %s ignored because of OCLC ID %s' % (isbn_classify['owi'], item.claims['P243'][0].getTarget()))
- elif 'P5331' not in item.claims: # Assign the OCLC work ID only if there is no work, and no OCLC ID for edition
- logger.warning('Add OCLC work ID (P5331): %s to edition' % (isbn_classify['owi']))
+ logger.warning('OCLC Work ID %s ignored because of OCLC ID %s'
+ % (isbn_classify['owi'],
+ item.claims['P243'][0].getTarget()))
+ # Assign the OCLC work ID only if there is no work, and no OCLC ID
+ # for edition
+ elif 'P5331' not in item.claims:
+ logger.warning('Add OCLC work ID (P5331): %s to edition'
+ % (isbn_classify['owi']))
claim = pywikibot.Claim(repo, 'P5331')
claim.setTarget(isbn_classify['owi'])
item.addClaim(claim, bot=True, summary=transcmt)

- # Reverse logic for moving OCLC ID and P212 (ISBN) from work to edition is more difficult because of 1:M relationship...
+ # Reverse logic for moving OCLC ID and P212 (ISBN) from work to
+ # edition is more difficult because of 1:M relationship...

# Same logic as for OCLC (work) ID

# Goodreads-identificatiecode (P2969)

- # Goodreads-identificatiecode for work (P8383) should not be registered for editions; should rather use P2969
+ # Goodreads-identificatiecode for work (P8383) should not be
+ # registered for editions; should rather use P2969

# Library of Congress Classification (works and editions)
if 'lcc' in isbn_classify and 'P8360' not in item.claims:
- logger.warning('Add Library of Congress Classification for edition (P8360): %s' % (isbn_classify['lcc']))
+ logger.warning(
+ 'Add Library of Congress Classification for edition (P8360): %s'
+ % (isbn_classify['lcc']))
claim = pywikibot.Claim(repo, 'P8360')
claim.setTarget(isbn_classify['lcc'])
item.addClaim(claim, bot=True, summary=transcmt)

# Dewey Decimale Classificatie
if 'ddc' in isbn_classify and 'P1036' not in item.claims:
- logger.warning('Add Dewey Decimale Classificatie (P1036): %s' % (isbn_classify['ddc']))
+ logger.warning('Add Dewey Decimale Classificatie (P1036): %s'
+ % (isbn_classify['ddc']))
claim = pywikibot.Claim(repo, 'P1036')
claim.setTarget(isbn_classify['ddc'])
item.addClaim(claim, bot=True, summary=transcmt)
@@ -666,7 +736,8 @@
# https://www.oclc.org/research/areas/data-science/fast.html
# https://www.oclc.org/content/dam/oclc/fast/FAST-quick-start-guide-2022.pdf

- # Authority control identifier from WorldCat's “FAST Linked Data” authority file (external ID P2163)
+ # Authority control identifier from WorldCat's “FAST Linked Data”
+ # authority file (external ID P2163)
# Corresponding to P921 (Wikidata main subject)
if 'fast' in isbn_classify:
for fast_id in isbn_classify['fast']:
@@ -679,109 +750,142 @@
""" % (fast_id))

logger.info(main_subject_query)
- generator = pg.WikidataSPARQLPageGenerator(main_subject_query, site=wikidata_site)
+ generator = pg.WikidataSPARQLPageGenerator(main_subject_query,
+ site=wikidata_site)

rescnt = 0
- for main_subject in generator: # Main loop for all DISTINCT items
+ for main_subject in generator: # Main loop for all DISTINCT items
rescnt += 1
qmain_subject = main_subject.getID()
try:
main_subject_label = main_subject.labels[booklang]
- logger.info('Found main subject %s (%s) for Fast ID %s' % (main_subject_label, qmain_subject, fast_id))
- except:
+ logger.info('Found main subject %s (%s) for Fast ID %s'
+ % (main_subject_label, qmain_subject, fast_id))
+ except: # noqa B001, E722, H201
main_subject_label = ''
- logger.info('Found main subject (%s) for Fast ID %s' % (qmain_subject, fast_id))
- logger.error('Missing label for item %s' % qmain_subject)
+ logger.info('Found main subject (%s) for Fast ID %s'
+ % (qmain_subject, fast_id))
+ logger.error('Missing label for item %s'
+ % qmain_subject)

# Create or amend P921 statement
if rescnt == 0:
- logger.error('Main subject not found for Fast ID %s' % (fast_id))
+ logger.error('Main subject not found for Fast ID %s'
+ % (fast_id))
elif rescnt == 1:
add_main_subject = True
- if 'P921' in item.claims: # Check for duplicates
+ if 'P921' in item.claims: # Check for duplicates
for seq in item.claims['P921']:
if seq.getTarget().getID() == qmain_subject:
add_main_subject = False
break

if add_main_subject:
- logger.warning('Add main subject (P921) %s (%s)' % (main_subject_label, qmain_subject))
+ logger.warning('Add main subject (P921) %s (%s)'
+ % (main_subject_label, qmain_subject))
claim = pywikibot.Claim(repo, 'P921')
claim.setTarget(main_subject)
item.addClaim(claim, bot=True, summary=transcmt)
else:
- logger.info('Skipping main subject %s (%s)' % (main_subject_label, qmain_subject))
+ logger.info('Skipping main subject %s (%s)'
+ % (main_subject_label, qmain_subject))
else:
- logger.error('Ambiguous main subject for Fast ID %s' % (fast_id))
+ logger.error('Ambiguous main subject for Fast ID %s'
+ % (fast_id))

# Book description
- isbn_description = desc(isbn_number)
+ isbn_description = isbnlib.desc(isbn_number)
if isbn_description != '':
- print()
- print(isbn_description)
+ pywikibot.info()
+ pywikibot.info(isbn_description)

# Currently does not work (service not available)
try:
logger.warning('BibTex unavailable')
return 0
- bibtex_metadata = doi2tex(isbn_doi)
- print(bibtex_metadata)
+ bibtex_metadata = isbnlib.doi2tex(isbn_doi)
+ pywikibot.info(bibtex_metadata)
except Exception as error:
logger.error(error) # Data not available

return 0


-# Error logging
-logger = logging.getLogger('create_isbn_edition')
-#logging.basicConfig(level=logging.DEBUG) # Uncomment for debugging
-##logger.setLevel(logging.DEBUG)
+def main(*args: str) -> None:
+ """
+ Process command line arguments and invoke bot.

-pgmnm = sys.argv.pop(0)
-logger.debug('%s %s' % (pgmnm, '2022-08-23 (gvp)'))
+ If args is an empty list, sys.argv is used.

-# Get optional parameters
+ :param args: command line arguments
+ """
+ # Error logging
+ global logger
+ global repo
+ global targetx
+ global wikidata_site

-# Get the digital library
-if len(sys.argv) > 0:
- booklib = sys.argv.pop(0)
- if booklib == '-':
- booklib = 'goob'
+ logger = logging.getLogger('create_isbn_edition')

-# Get the native language
-# The language code is only required when P/Q parameters are added, or different from the LANG code
-if len(sys.argv) > 0:
- mainlang = sys.argv.pop(0)
+ # Get optional parameters
+ local_args = pywikibot.handle_args(*args)

-# Get additional P/Q parameters
-while len(sys.argv) > 0:
- inpar = propre.findall(sys.argv.pop(0).upper())[0]
- target[inpar] = qsuffre.findall(sys.argv.pop(0).upper())[0]
+ # Login to Wikibase instance
+ wikidata_site = pywikibot.Site('wikidata')
+ # Required for wikidata object access (item, property, statement)
+ repo = wikidata_site.data_repository()

-# Validate P/Q list
-proptyx={}
-targetx={}
+ # Get the digital library
+ if local_args:
+ booklib = local_args.pop(0)
+ if booklib == '-':
+ booklib = 'goob'

-# Validate the propery/instance pair
-for propty in target:
- if propty not in proptyx:
- proptyx[propty] = pywikibot.PropertyPage(repo, propty)
- targetx[propty] = pywikibot.ItemPage(repo, target[propty])
- targetx[propty].get(get_redirect=True)
- if propty in propreqinst and ('P31' not in targetx[propty].claims or not is_in_list(targetx[propty].claims['P31'], propreqinst[propty])):
- logger.critical('%s (%s) is not a language' % (targetx[propty].labels[mainlang], target[propty]))
- sys.exit(12)
+ # Get the native language
+ # The language code is only required when P/Q parameters are added,
+ # or different from the LANG code
+ if local_args:
+ mainlang = local_args.pop(0)

-# Get list of item numbers
-inputfile = sys.stdin.read() # Typically the Appendix list of references of e.g. a Wikipedia page containing ISBN numbers
-itemlist = sorted(set(isbnre.findall(inputfile))) # Extract all ISBN numbers
+ # Get additional P/Q parameters
+ while local_args:
+ inpar = propre.findall(local_args.pop(0).upper())[0]
+ target[inpar] = qsuffre.findall(local_args(0).upper())[0]

-for isbn_number in itemlist: # Process the next edition
- amend_isbn_edition(isbn_number)
+ # Validate P/Q list
+ proptyx = {}
+ targetx = {}

-# Einde van de miserie
-"""
-Notes:
+ # Validate the propery/instance pair
+ for propty in target:
+ if propty not in proptyx:
+ proptyx[propty] = pywikibot.PropertyPage(repo, propty)
+ targetx[propty] = pywikibot.ItemPage(repo, target[propty])
+ targetx[propty].get(get_redirect=True)
+ if propty in propreqinst and (
+ 'P31' not in targetx[propty].claims
+ or not is_in_list(targetx[propty].claims['P31'],
+ propreqinst[propty])):
+ logger.critical('%s (%s) is not a language'
+ % (targetx[propty].labels[mainlang],
+ target[propty]))
+ sys.exit(12)
+
+ # check dependencies
+ for module in (isbnlib, unidecode):
+ if isinstance(module, ImportError):
+ raise module
+
+ # Get list of item numbers
+ # Typically the Appendix list of references of e.g. a Wikipedia page
+ # containing ISBN numbers
+ inputfile = pywikibot.input('Get list of item numbers')
+ # Extract all ISBN numbers
+ itemlist = sorted(set(isbnre.findall(inputfile)))
+
+ for isbn_number in itemlist: # Process the next edition
+ amend_isbn_edition(isbn_number)


-"""
+if __name__ == '__main__':
+ main()
diff --git a/setup.py b/setup.py
index 21779d9..00a1cb9 100755
--- a/setup.py
+++ b/setup.py
@@ -97,6 +97,7 @@

# ------- setup extra_requires for scripts ------- #
script_deps = {
+ 'create_isbn_edition.py': ['isbnlib', 'unidecode'],
'commons_information.py': extra_deps['mwparserfromhell'],
'patrol.py': extra_deps['mwparserfromhell'],
'weblinkchecker.py': extra_deps['memento'],
diff --git a/tests/script_tests.py b/tests/script_tests.py
index 94d0b80..d499269 100755
--- a/tests/script_tests.py
+++ b/tests/script_tests.py
@@ -26,6 +26,7 @@
# These dependencies are not always the package name which is in setup.py.
# Here, the name given to the module which will be imported is required.
script_deps = {
+ 'create_isbn_edition': ['isbnlib', 'unidecode'],
'commons_information': ['mwparserfromhell'],
'patrol': ['mwparserfromhell'],
'weblinkchecker': ['memento_client'],
@@ -374,7 +375,7 @@
# Here come scripts requiring and missing dependencies, that haven't been
# fixed to output -help in that case.
_expected_failures = {'version'}
- _allowed_failures = ['create_isbn_edition']
+ _allowed_failures = []

_arguments = '-help'
_results = None
diff --git a/tox.ini b/tox.ini
index ecf4bfc..3b35408 100644
--- a/tox.ini
+++ b/tox.ini
@@ -164,7 +164,6 @@
scripts/clean_sandbox.py: N816
scripts/commonscat.py: N802, N806, N816
scripts/cosmetic_changes.py: N816
- scripts/create_isbn_edition.py: C901, D100, E402, E501, F405, T201
scripts/dataextend.py: C901, D101, D102, E126, E127, E131, E501
scripts/harvest_template.py: N802, N816
scripts/interwiki.py: N802, N803, N806, N816

To view, visit change 826937. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Change-Id: I6917ec9b511db609c2f1828486b9a53998d1e376
Gerrit-Change-Number: 826937
Gerrit-PatchSet: 17
Gerrit-Owner: Xqt <info@gno.de>
Gerrit-Reviewer: D3r1ck01 <xsavitar.wiki@aol.com>
Gerrit-Reviewer: Geertivp <geertivp@gmail.com>
Gerrit-Reviewer: Xqt <info@gno.de>
Gerrit-Reviewer: jenkins-bot
Gerrit-MessageType: merged