http://www.mediawiki.org/wiki/Special:Code/pywikipedia/9793
Revision: 9793
Author: xqt
Date: 2011-12-09 18:25:00 +0000 (Fri, 09 Dec 2011)
Log Message:
-----------
Some iw links are encoded with html entity. Decode &-entity first. See
http://de.wikipedia.org/w/index.php?title=Benutzer_Diskussion%3AXqt&act…
Modified Paths:
--------------
trunk/pywikipedia/wikipedia.py
Modified: trunk/pywikipedia/wikipedia.py
===================================================================
--- trunk/pywikipedia/wikipedia.py 2011-12-09 14:31:19 UTC (rev 9792)
+++ trunk/pywikipedia/wikipedia.py 2011-12-09 18:25:00 UTC (rev 9793)
@@ -4643,7 +4643,7 @@
# This regular expression will match any decimal and hexadecimal entity and
# also entities that might be named entities.
entityR = re.compile(
-
r'&(#(?P<decimal>\d+)|#x(?P<hex>[0-9a-fA-F]+)|(?P<name>[A-Za-z]+));')
+
r'&(?:amp;)?(#(?P<decimal>\d+)|#x(?P<hex>[0-9a-fA-F]+)|(?P<name>[A-Za-z]+));')
# These characters are Html-illegal, but sadly you *can* find some of
# these and converting them to unichr(decimal) is unsuitable
convertIllegalHtmlEntities = {