Pyfisch has submitted this change and it was merged.
Change subject: convert html text to unicode, read charset, use utf-8 by default
......................................................................
convert html text to unicode, read charset, use utf-8 by default
Change-Id: I208500f399a309665ecbac082db72de72c354a5f
---
M pywikibot/comms/http.py
1 file changed, 10 insertions(+), 3 deletions(-)
Approvals:
Pyfisch: Verified; Looks good to me, approved
diff --git a/pywikibot/comms/http.py b/pywikibot/comms/http.py
index 6a4c287..e9bc57f 100644
--- a/pywikibot/comms/http.py
+++ b/pywikibot/comms/http.py
@@ -13,7 +13,7 @@
"""
#
-# (C) Pywikipedia bot team, 2007
+# (C) Pywikipedia bot team, 2008-2014
#
# Distributed under the terms of the MIT license.
#
@@ -24,6 +24,7 @@
import urllib
import logging
import atexit
+import re
try:
from httplib2 import SSLHandshakeError
@@ -146,5 +147,11 @@
if request.data[0].status != 200:
pywikibot.warning(u"Http response status %(status)s"
% {'status': request.data[0].status})
-
- return request.data[1]
+ text = request.data[1]
+ # Convert text to Unicode
+ try:
+ charset = re.findall('charset=([^\'\";]+)', text)[0]
+ except IndexError:
+ charset = 'utf-8' # default
+ text = unicode(text, charset, errors='strict')
+ return text
--
To view, visit
https://gerrit.wikimedia.org/r/110674
To unsubscribe, visit
https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I208500f399a309665ecbac082db72de72c354a5f
Gerrit-PatchSet: 1
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: Xqt <info(a)gno.de>
Gerrit-Reviewer: Ladsgroup <ladsgroup(a)gmail.com>
Gerrit-Reviewer: Merlijn van Deen <valhallasw(a)arctus.nl>
Gerrit-Reviewer: Pyfisch <pyfisch(a)gmail.com>
Gerrit-Reviewer: Russell Blau <russblau(a)imapmail.org>
Gerrit-Reviewer: Xqt <info(a)gno.de>
Gerrit-Reviewer: jenkins-bot <>