jenkins-bot has submitted this change. ( https://gerrit.wikimedia.org/r/c/pywikibot/core/+/754071 )
Change subject: [doc] Update documentation for removeDisabledParts
......................................................................
[doc] Update documentation for removeDisabledParts
- add typing hints
- enable list for defaults
- add versionchanged
- add Container to backports.py
Change-Id: I0ba172fdd3e8534048e5e04572a9e309d29362f0
---
M pywikibot/backports.py
M pywikibot/textlib.py
2 files changed, 15 insertions(+), 15 deletions(-)
Approvals:
Xqt: Looks good to me, approved
jenkins-bot: Verified
diff --git a/pywikibot/backports.py b/pywikibot/backports.py
index ee524c2..54c15e0 100644
--- a/pywikibot/backports.py
+++ b/pywikibot/backports.py
@@ -1,6 +1,6 @@
"""This module contains backports to support older Python versions."""
#
-# (C) Pywikibot team, 2014-2021
+# (C) Pywikibot team, 2014-2022
#
# Distributed under the terms of the MIT license.
#
@@ -61,6 +61,7 @@
if PYTHON_VERSION < (3, 9):
from typing import (
+ Container,
Dict,
FrozenSet,
Generator,
@@ -77,7 +78,7 @@
)
else:
from collections.abc import (
- Generator, Iterable, Iterator, Mapping, Sequence,
+ Container, Generator, Iterable, Iterator, Mapping, Sequence,
)
from re import Match, Pattern
Dict = dict # type: ignore[misc]
diff --git a/pywikibot/textlib.py b/pywikibot/textlib.py
index fdffe94..bd26f10 100644
--- a/pywikibot/textlib.py
+++ b/pywikibot/textlib.py
@@ -6,7 +6,7 @@
"""
#
-# (C) Pywikibot team, 2008-2021
+# (C) Pywikibot team, 2008-2022
#
# Distributed under the terms of the MIT license.
#
@@ -19,9 +19,8 @@
from typing import NamedTuple, Optional, Union
import pywikibot
-from pywikibot.backports import List
+from pywikibot.backports import Container, Iterable, List, Tuple
from pywikibot.backports import OrderedDict as OrderedDictType
-from pywikibot.backports import Tuple
from pywikibot.exceptions import InvalidTitleError, SiteDefinitionError
from pywikibot.family import Family
@@ -433,32 +432,32 @@
return text
-def removeDisabledParts(text: str, tags=None, include=None, site=None) -> str:
+def removeDisabledParts(text: str,
+ tags: Optional[Iterable] = None,
+ include: Optional[Container] = None,
+ site: Optional['pywikibot.site.BaseSite'] = None
+ ) -> str:
"""
Return text without portions where wiki markup is disabled.
- Parts that will be removed by default are
+ Parts that will be removed by default are:
+
* HTML comments
* nowiki tags
* pre tags
* includeonly tags
* source and syntaxhighlight tags
+ .. versionchanged:: 7.0
+ the order of removals will correspond to the tags argument
+ if provided as an ordered collection (list, tuple)
:param tags: The exact set of parts which should be removed using
keywords from textlib._get_regexes().
- :type tags: list, set, tuple or None
-
:param include: Or, in alternative, default parts that shall not
be removed.
- :type include: list, set, tuple or None
-
:param site: Site to be used for site-dependent regexes. Default
disabled parts listed above do not need it.
- :type site: pywikibot.Site
-
:return: text stripped from disabled parts.
- .. note:: the order of removals will correspond to the tags argument
- if provided as an ordered sequence (list, tuple)
"""
if not tags:
tags = ['comment', 'includeonly', 'nowiki', 'pre', 'syntaxhighlight']
--
To view, visit https://gerrit.wikimedia.org/r/c/pywikibot/core/+/754071
To unsubscribe, or for help writing mail filters, visit https://gerrit.wikimedia.org/r/settings
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Change-Id: I0ba172fdd3e8534048e5e04572a9e309d29362f0
Gerrit-Change-Number: 754071
Gerrit-PatchSet: 1
Gerrit-Owner: Xqt <info(a)gno.de>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: jenkins-bot
Gerrit-MessageType: merged
jenkins-bot has submitted this change. ( https://gerrit.wikimedia.org/r/c/pywikibot/core/+/754070 )
Change subject: [doc] Update ROADMAP.rst and CHANGELOG.md
......................................................................
[doc] Update ROADMAP.rst and CHANGELOG.md
Change-Id: Ia9eca82bbf7b71595691588eb50761b2a758d897
---
M ROADMAP.rst
M scripts/CHANGELOG.md
2 files changed, 12 insertions(+), 1 deletion(-)
Approvals:
Xqt: Looks good to me, approved
jenkins-bot: Verified
diff --git a/ROADMAP.rst b/ROADMAP.rst
index a670af2..7e235f0 100644
--- a/ROADMAP.rst
+++ b/ROADMAP.rst
@@ -4,8 +4,12 @@
Improvements
------------
+* Avoid non-deteministic behavior in removeDisableParts
+* Update isbn dependency and require python-stdnum >= 1.17
+* Synchronize Page.linkedPages() parameters with Site.pagelinks() parameters
+* Scripts hash bang was changed from python to python3
* i18n.bundles(), i18n.known_languages and i18n._get_bundle() functions were added
-* Raise ConnectionError immediately if urllib3.NewConnectionError occurs (T297994)
+* Raise ConnectionError immediately if urllib3.NewConnectionError occurs (T297994, 298859)
* Make pywikibot messages available with site package (T57109, T275981)
* Add support for API:Redirects
* Enable shell script with Pywikibot site package
@@ -35,6 +39,8 @@
Bugfixes
--------
+* Remove question mark character from forbidden file name characters (T93482)
+* Enable -interwiki option with pagegenerators (T57099)
* Don't assert login result (T298761)
* Allow title placeholder $1 in the middle of an url (T111513, T298078)
* Don't create a Site object if pywikibot is not fully imported (T298384)
@@ -54,11 +60,14 @@
* Support of Python 3.5.0 - 3.5.2 has been dropped (T286867)
* generate_user_files.py, generate_user_files.py, shell.py and version.py were moved to pywikibot/scripts and must be used with pwb wrapper script
+* *See also Code cleanups below*
Code cleanups
-------------
+* Remove AllpagesPageGenerator, UnconnectedPageGenerator, CombinedPageGenerator, WantedPagesPageGenerator pagegenerators
+* Remove deprecated echo.Notification.id
* Remove APISite.newfiles() method (T168339)
* Remove APISite.page_exists() method
* Raise a TypeError if BaseBot.init_page return None
diff --git a/scripts/CHANGELOG.md b/scripts/CHANGELOG.md
index 904aa26..5ea028d 100644
--- a/scripts/CHANGELOG.md
+++ b/scripts/CHANGELOG.md
@@ -18,6 +18,7 @@
* Derive CheckerBot from CurrentPageBot (T196851, T171713)
### category
+* Recurse CategoryListifyRobot with depth
* Show a warning if a pagegenerator option is not enabled (T298522)
* Deprecated code parts were removed
@@ -31,6 +32,7 @@
* pass site arg only once (T292367)
### fixing_redirects
+* Let only put_current show the message "No changes were needed"
* Use concurrent.futures to retrieve redirect or moved targets (T298789)
* Add an option to ignore solving moved targets (T298789)
--
To view, visit https://gerrit.wikimedia.org/r/c/pywikibot/core/+/754070
To unsubscribe, or for help writing mail filters, visit https://gerrit.wikimedia.org/r/settings
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Change-Id: Ia9eca82bbf7b71595691588eb50761b2a758d897
Gerrit-Change-Number: 754070
Gerrit-PatchSet: 1
Gerrit-Owner: Xqt <info(a)gno.de>
Gerrit-Reviewer: D3r1ck01 <xsavitar.wiki(a)aol.com>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: jenkins-bot
Gerrit-MessageType: merged
jenkins-bot has submitted this change. ( https://gerrit.wikimedia.org/r/c/pywikibot/core/+/754066 )
Change subject: Avoid non-deteministic behavior in removeDisableParts
......................................................................
Avoid non-deteministic behavior in removeDisableParts
The order of iteration over a set can change everytime a new
Python process is run because the string hashes are different.[1]
This means the function can apply replacements in different order
which may result in different output. So don't cast the sequence
to a set and use any sets at all, it isn't really necessary.
Add a regression test.
[1] https://docs.python.org/3/reference/datamodel.html#object.__hash__
Change-Id: If66173a7301d308b7addd7f63c7a1d2a2771abbc
---
M pywikibot/textlib.py
M tests/textlib_tests.py
2 files changed, 28 insertions(+), 4 deletions(-)
Approvals:
Xqt: Looks good to me, approved
jenkins-bot: Verified
diff --git a/pywikibot/textlib.py b/pywikibot/textlib.py
index c8d89b6..fdffe94 100644
--- a/pywikibot/textlib.py
+++ b/pywikibot/textlib.py
@@ -457,13 +457,20 @@
:type site: pywikibot.Site
:return: text stripped from disabled parts.
+ .. note:: the order of removals will correspond to the tags argument
+ if provided as an ordered sequence (list, tuple)
"""
if not tags:
- tags = {'comment', 'includeonly', 'nowiki', 'pre', 'syntaxhighlight'}
- else:
- tags = set(tags)
+ tags = ['comment', 'includeonly', 'nowiki', 'pre', 'syntaxhighlight']
+ # avoid set(tags) because sets are internally ordered using the hash
+ # which for strings is salted per Python process => the output of
+ # this function would likely be different per script run because
+ # the replacements would be done in different order and the disabled
+ # parts may overlap and suppress each other
+ # see https://docs.python.org/3/reference/datamodel.html#object.__hash__
+ # ("Note" at the end of the section)
if include:
- tags -= set(include)
+ tags = [tag for tag in tags if tag not in include]
regexes = _get_regexes(tags, site)
for regex in regexes:
text = regex.sub('', text)
diff --git a/tests/textlib_tests.py b/tests/textlib_tests.py
index f52dc4d..5b1bcec 100644
--- a/tests/textlib_tests.py
+++ b/tests/textlib_tests.py
@@ -657,6 +657,23 @@
self.assertEqual(
textlib.removeDisabledParts(pattern, tags=[test]), '')
+ def test_remove_disabled_parts_include(self):
+ """Test removeDisabledParts function with the include argument."""
+ text = 'text <nowiki>tag</nowiki> text'
+ self.assertEqual(
+ textlib.removeDisabledParts(text, include=['nowiki']), text)
+
+ def test_remove_disabled_parts_order(self):
+ """Test the order of the replacements in removeDisabledParts."""
+ text = 'text <ref>This is a reference.</ref> text'
+ regex = re.compile('</?ref>')
+ self.assertEqual(
+ textlib.removeDisabledParts(text, tags=['ref', regex]),
+ 'text text')
+ self.assertEqual(
+ textlib.removeDisabledParts(text, tags=[regex, 'ref']),
+ 'text This is a reference. text')
+
class TestReplaceLinks(TestCase):
--
To view, visit https://gerrit.wikimedia.org/r/c/pywikibot/core/+/754066
To unsubscribe, or for help writing mail filters, visit https://gerrit.wikimedia.org/r/settings
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Change-Id: If66173a7301d308b7addd7f63c7a1d2a2771abbc
Gerrit-Change-Number: 754066
Gerrit-PatchSet: 5
Gerrit-Owner: Matěj Suchánek <matejsuchanek97(a)gmail.com>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: jenkins-bot
Gerrit-MessageType: merged
jenkins-bot has submitted this change. ( https://gerrit.wikimedia.org/r/c/pywikibot/core/+/751400 )
Change subject: [IMPR] Synchronize Page.linkedPages with Site.pagelinks
......................................................................
[IMPR] Synchronize Page.linkedPages with Site.pagelinks
- deprecate positional parameters in linkedPages but enable
follow_redirect parameter
Change-Id: I66320602b0760097a2acd81ba06cf8ff8269c2bd
---
M pywikibot/page/__init__.py
M pywikibot/site/_generators.py
2 files changed, 54 insertions(+), 25 deletions(-)
Approvals:
Xqt: Looks good to me, approved
jenkins-bot: Verified
diff --git a/pywikibot/page/__init__.py b/pywikibot/page/__init__.py
index 903d7c0..fa76eba 100644
--- a/pywikibot/page/__init__.py
+++ b/pywikibot/page/__init__.py
@@ -11,7 +11,7 @@
"""
#
-# (C) Pywikibot team, 2008-2021
+# (C) Pywikibot team, 2008-2022
#
# Distributed under the terms of the MIT license.
#
@@ -31,7 +31,7 @@
import pywikibot
from pywikibot import config, i18n, textlib
-from pywikibot.backports import Dict, Iterable, List, Tuple
+from pywikibot.backports import Dict, Generator, Iterable, List, Tuple
from pywikibot.comms import http
from pywikibot.exceptions import (
APIError,
@@ -1363,27 +1363,51 @@
else:
raise NoPageError(self)
- def linkedPages(self, namespaces=None,
- total: Optional[int] = None,
- content: bool = False):
- """
- Iterate Pages that this Page links to.
+ def linkedPages(
+ self, *args, **kwargs
+ ) -> Generator['pywikibot.Page', None, None]:
+ """Iterate Pages that this Page links to.
- Only returns pages from "normal" internal links. Image and category
- links are omitted unless prefixed with ":". Embedded templates are
- omitted (but links within them are returned). All interwiki and
- external links are omitted.
+ Only returns pages from "normal" internal links. Embedded
+ templates are omitted but links within them are returned. All
+ interwiki and external links are omitted.
- :param namespaces: only iterate links in these namespaces
- :param namespaces: int, or list of ints
- :param total: iterate no more than this number of pages in total
- :param content: if True, retrieve the content of the current version
- of each linked page (default False)
- :return: a generator that yields Page objects.
- :rtype: generator
+ For the parameters refer
+ :py:mod:`APISite.pagelinks<pywikibot.site.APISite.pagelinks>`
+
+ .. versionadded:: 7.0.0
+ the `follow_redirects` keyword argument
+ .. deprecated:: 7.0.0
+ the positional arguments
+
+ .. seealso:: https://www.mediawiki.org/wiki/API:Links
+
+ :keyword namespaces: Only iterate pages in these namespaces
+ (default: all)
+ :type namespaces: iterable of str or Namespace key,
+ or a single instance of those types. May be a '|' separated
+ list of namespace identifiers.
+ :keyword follow_redirects: if True, yields the target of any redirects,
+ rather than the redirect page
+ :keyword total: iterate no more than this number of pages in total
+ :keyword content: if True, load the current content of each page
"""
- return self.site.pagelinks(self, namespaces=namespaces,
- total=total, content=content)
+ # Deprecate positional arguments and synchronize with Site.pagelinks
+ keys = ('namespaces', 'total', 'content')
+ for i, arg in enumerate(args):
+ key = keys[i]
+ issue_deprecation_warning(
+ 'Positional argument {} ({})'.format(i + 1, arg),
+ 'keyword argument "{}={}"'.format(key, arg),
+ since='7.0.0')
+ if key in kwargs:
+ pywikibot.warning('{!r} is given as keyword argument {!r} '
+ 'already; ignoring {!r}'
+ .format(key, arg, kwargs[key]))
+ else:
+ kwargs[key] = arg
+
+ return self.site.pagelinks(self, **kwargs)
def interwiki(self, expand=True):
"""
diff --git a/pywikibot/site/_generators.py b/pywikibot/site/_generators.py
index d40a54c..d3b8e65 100644
--- a/pywikibot/site/_generators.py
+++ b/pywikibot/site/_generators.py
@@ -14,7 +14,7 @@
import pywikibot
import pywikibot.family
-from pywikibot.backports import Dict, Iterable, List
+from pywikibot.backports import Dict, Generator, Iterable, List
from pywikibot.data import api
from pywikibot.exceptions import (
APIError,
@@ -360,11 +360,16 @@
namespaces=namespaces, content=content)
), total)
- def pagelinks(self, page, *, namespaces=None, follow_redirects=False,
- total=None, content=False):
+ def pagelinks(
+ self, page, *,
+ namespaces=None,
+ follow_redirects: bool = False,
+ total: Optional[int] = None,
+ content: bool = False
+ ) -> Generator['pywikibot.Page', None, None]:
"""Iterate internal wikilinks contained (or transcluded) on page.
- :see: https://www.mediawiki.org/wiki/API:Links
+ .. seealso:: https://www.mediawiki.org/wiki/API:Links
:param namespaces: Only iterate pages in these namespaces
(default: all)
@@ -373,8 +378,8 @@
list of namespace identifiers.
:param follow_redirects: if True, yields the target of any redirects,
rather than the redirect page
+ :param total: iterate no more than this number of pages in total
:param content: if True, load the current content of each iterated page
- (default False)
:raises KeyError: a namespace identifier was not resolved
:raises TypeError: a namespace identifier has an inappropriate
type such as NoneType or bool
--
To view, visit https://gerrit.wikimedia.org/r/c/pywikibot/core/+/751400
To unsubscribe, or for help writing mail filters, visit https://gerrit.wikimedia.org/r/settings
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Change-Id: I66320602b0760097a2acd81ba06cf8ff8269c2bd
Gerrit-Change-Number: 751400
Gerrit-PatchSet: 2
Gerrit-Owner: Xqt <info(a)gno.de>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: jenkins-bot
Gerrit-MessageType: merged