jenkins-bot has submitted this change and it was merged. (
https://gerrit.wikimedia.org/r/329761 )
Change subject: Fix and improve default regexes
......................................................................
Fix and improve default regexes
- Remove superfluous flags.
- Clean up 'header' using multiline.
- Expand 'pre' and 'table' to support HTML attributes (mostly
'style').
- Update 'property' to support parameters (currently, it supports
"|from=" but it might support more in the future).
- Localize 'property' and 'invoke' using magic words.
- Add singleline to 'invoke'.
Change-Id: Ib805bf70cb1cc99711138d7d6c7e40971f31b602
---
M pywikibot/textlib.py
1 file changed, 10 insertions(+), 8 deletions(-)
Approvals:
jenkins-bot: Verified
Xqt: Looks good to me, approved
diff --git a/pywikibot/textlib.py b/pywikibot/textlib.py
index 9f7782e..dce6608 100644
--- a/pywikibot/textlib.py
+++ b/pywikibot/textlib.py
@@ -221,13 +221,13 @@
_regex_cache.update({
'comment': re.compile(r'(?s)<!--.*?-->'),
# section headers
- 'header': re.compile(r'\r?\n=+.+=+ *\r?\n'),
+ 'header': re.compile(r'(?m)^=+.+=+ *$'),
# preformatted text
- 'pre': re.compile(r'(?ism)<pre>.*?</pre>'),
+ 'pre': re.compile(r'(?is)<pre[
>].*?</pre>'),
'source': re.compile(r'(?is)<source
.*?</source>'),
- 'score': re.compile(r'(?ism)<score[
>].*?</score>'),
+ 'score': re.compile(r'(?is)<score[
>].*?</score>'),
# inline references
- 'ref': re.compile(r'(?ism)<ref[
>].*?</ref>'),
+ 'ref': re.compile(r'(?is)<ref[
>].*?</ref>'),
'template': NESTED_TEMPLATE_REGEX,
# lines that start with a space are shown in a monospace font and
# have whitespace preserved.
@@ -235,7 +235,7 @@
# tables often have whitespace that is used to improve wiki
# source code readability.
# TODO: handle nested tables.
- 'table':
re.compile(r'(?ims)^{\|.*?^\|}|<table>.*?</table>'),
+ 'table': re.compile(r'(?ims)^{\|.*?^\|}|<table[
>].*?</table>'),
'hyperlink': compileLinkR(),
'gallery':
re.compile(r'(?is)<gallery.*?>.*?</gallery>'),
# this matches internal wikilinks, but also interwiki, categories, and
@@ -247,11 +247,13 @@
site.validLanguageLinks() +
list(site.family.obsolete.keys()))),
# Wikibase property inclusions
- 'property':
re.compile(r'(?i)\{\{\s*#property:\s*p\d+\s*\}\}'),
+ 'property': (r'(?i)\{\{\s*\#(?:%s):\s*p\d+.*?\}\}',
+ lambda site:
'|'.join(site.getmagicwords('property'))),
# Module invocations (currently only Lua)
- 'invoke': re.compile(r'(?i)\{\{\s*#invoke:.*?}\}'),
+ 'invoke': (r'(?is)\{\{\s*\#(?:%s):.*?\}\}',
+ lambda site:
'|'.join(site.getmagicwords('invoke'))),
# categories
- 'category': ('\[\[ *(?:%s)\s*:.*?\]\]',
+ 'category': (r'\[\[ *(?:%s)\s*:.*?\]\]',
lambda site: '|'.join(site.namespaces[14])),
# files
'file': (FILE_LINK_REGEX,
--
To view, visit
https://gerrit.wikimedia.org/r/329761
To unsubscribe, visit
https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib805bf70cb1cc99711138d7d6c7e40971f31b602
Gerrit-PatchSet: 5
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: Matěj Suchánek <matejsuchanek97(a)gmail.com>
Gerrit-Reviewer: Dalba <dalba.wiki(a)gmail.com>
Gerrit-Reviewer: Ladsgroup <Ladsgroup(a)gmail.com>
Gerrit-Reviewer: Magul <tomasz.magulski(a)gmail.com>
Gerrit-Reviewer: Matěj Suchánek <matejsuchanek97(a)gmail.com>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: jenkins-bot <>