Did you change pagegenerators.GeneratorFactory? Mine doesn't have a site parameter...
In any case, the following allows me to reproduce your error:
~~~~~
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Fields to fill:
username = u"YourUsername"
import pywikibot
from pywikibot import family, pagegenerators
from scripts.replace import ReplaceRobot
class KdeFamily(family.Family):
def __init__(self):
super(KdeFamily, self).__init__()
def version(self, code):
return "1.20.2"
def scriptpath(self, code):
return ''
class TechbaseFamily(KdeFamily):
def __init__(self):
super(TechbaseFamily, self).__init__()
self.name = 'techbase'
self.langs['en'] = u"techbase.kde.org"
site = pywikibot.Site("en", TechbaseFamily(), username)
generator = pagegenerators.AllpagesPageGenerator(u"Localization", 0, False, site=site)
preloadingGen = pagegenerators.PreloadingGenerator(generator)
bot = ReplaceRobot(generator=preloadingGen, replacements=[])
bot.run()
~~~~~
---
** [bugs:#1658] “Title contains illegal char (\\uFFFD)” with existing page**
**Status:** open
**Labels:** character encoding
**Created:** Sun Aug 25, 2013 08:00 AM UTC by Adrián Chaves Fernández
**Last Updated:** Sun Aug 25, 2013 01:08 PM UTC
**Owner:** nobody
This is happening with the following existing page: http://techbase.kde.org/Localization/fy/Fryske_kompjûterwurden
Traceback (most recent call last):
File "maintenance.py", line 81, in <module>
main()
File "maintenance.py", line 77, in main
bot.run()
File "/home/gallaecio/fontes/rodela/scripts/replace.py", line 326, in run
for page in self.generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 799, in PreloadingGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 749, in DuplicateFilterPageGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 706, in __iter__
yield self.result(item)
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 780, in result
p = pywikibot.Page(self.site, pagedata['title'], pagedata['ns'])
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 77, in __init__
self._link = Link(title, source=source, defaultNamespace=ns)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 2958, in __init__
raise pywikibot.Error("Title contains illegal char (\\uFFFD)")
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
Weird.
I’m using the ReplaceRobot class. This issue does not happen if I pass the -page:<page name> argument to the page generator, but it does happen if I use -start:!
The following code reproduces the issue:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Fields to fill:
username = u"YourUsername"
import pywikibot
from pywikibot import family, pagegenerators
from scripts.replace import ReplaceRobot
class KdeFamily(family.Family):
def __init__(self):
super(KdeFamily, self).__init__()
def version(self, code):
return "1.20.2"
def scriptpath(self, code):
return ''
class TechbaseFamily(KdeFamily):
def __init__(self):
super(TechbaseFamily, self).__init__()
self.name = 'techbase'
self.langs['en'] = u"techbase.kde.org"
site = pywikibot.Site("en", TechbaseFamily(), username)
genFactory = pagegenerators.GeneratorFactory(site=site)
genFactory.handleArg(u"-start:Localization")
generator = genFactory.getCombinedGenerator()
preloadingGen = pagegenerators.PreloadingGenerator(generator)
bot = ReplaceRobot(generator=preloadingGen, replacements=[])
bot.run()
---
** [bugs:#1658] “Title contains illegal char (\\uFFFD)” with existing page**
**Status:** open
**Labels:** character encoding
**Created:** Sun Aug 25, 2013 08:00 AM UTC by Adrián Chaves Fernández
**Last Updated:** Sun Aug 25, 2013 12:55 PM UTC
**Owner:** nobody
This is happening with the following existing page: http://techbase.kde.org/Localization/fy/Fryske_kompjûterwurden
Traceback (most recent call last):
File "maintenance.py", line 81, in <module>
main()
File "maintenance.py", line 77, in main
bot.run()
File "/home/gallaecio/fontes/rodela/scripts/replace.py", line 326, in run
for page in self.generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 799, in PreloadingGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 749, in DuplicateFilterPageGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 706, in __iter__
yield self.result(item)
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 780, in result
p = pywikibot.Page(self.site, pagedata['title'], pagedata['ns'])
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 77, in __init__
self._link = Link(title, source=source, defaultNamespace=ns)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 2958, in __init__
raise pywikibot.Error("Title contains illegal char (\\uFFFD)")
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
By the way, this is rewrite (just in case it was not clear, I forgot to set the right milestone).
---
** [bugs:#1658] “Title contains illegal char (\\uFFFD)” with existing page**
**Status:** open
**Labels:** character encoding
**Created:** Sun Aug 25, 2013 08:00 AM UTC by Adrián Chaves Fernández
**Last Updated:** Sun Aug 25, 2013 12:53 PM UTC
**Owner:** nobody
This is happening with the following existing page: http://techbase.kde.org/Localization/fy/Fryske_kompjûterwurden
Traceback (most recent call last):
File "maintenance.py", line 81, in <module>
main()
File "maintenance.py", line 77, in main
bot.run()
File "/home/gallaecio/fontes/rodela/scripts/replace.py", line 326, in run
for page in self.generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 799, in PreloadingGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 749, in DuplicateFilterPageGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 706, in __iter__
yield self.result(item)
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 780, in result
p = pywikibot.Page(self.site, pagedata['title'], pagedata['ns'])
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 77, in __init__
self._link = Link(title, source=source, defaultNamespace=ns)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 2958, in __init__
raise pywikibot.Error("Title contains illegal char (\\uFFFD)")
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
I’ve pulled again just in case, and with 1fc63f32f5d2e99b744c33f7376155753c64220c (a couple of commits after the one you mention) I can still reproduce it.
I’ll try to provide a simple piece of code to reproduce the issue.
---
** [bugs:#1658] “Title contains illegal char (\\uFFFD)” with existing page**
**Status:** open
**Labels:** character encoding
**Created:** Sun Aug 25, 2013 08:00 AM UTC by Adrián Chaves Fernández
**Last Updated:** Sun Aug 25, 2013 11:57 AM UTC
**Owner:** nobody
This is happening with the following existing page: http://techbase.kde.org/Localization/fy/Fryske_kompjûterwurden
Traceback (most recent call last):
File "maintenance.py", line 81, in <module>
main()
File "maintenance.py", line 77, in main
bot.run()
File "/home/gallaecio/fontes/rodela/scripts/replace.py", line 326, in run
for page in self.generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 799, in PreloadingGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 749, in DuplicateFilterPageGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 706, in __iter__
yield self.result(item)
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 780, in result
p = pywikibot.Page(self.site, pagedata['title'], pagedata['ns'])
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 77, in __init__
self._link = Link(title, source=source, defaultNamespace=ns)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 2958, in __init__
raise pywikibot.Error("Title contains illegal char (\\uFFFD)")
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
Which version are you running? Using 2a34d99, 2013/08/24, 22:34:40 it works for me.
---
** [bugs:#1658] “Title contains illegal char (\\uFFFD)” with existing page**
**Status:** open
**Labels:** character encoding
**Created:** Sun Aug 25, 2013 08:00 AM UTC by Adrián Chaves Fernández
**Last Updated:** Sun Aug 25, 2013 08:00 AM UTC
**Owner:** nobody
This is happening with the following existing page: http://techbase.kde.org/Localization/fy/Fryske_kompjûterwurden
Traceback (most recent call last):
File "maintenance.py", line 81, in <module>
main()
File "maintenance.py", line 77, in main
bot.run()
File "/home/gallaecio/fontes/rodela/scripts/replace.py", line 326, in run
for page in self.generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 799, in PreloadingGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 749, in DuplicateFilterPageGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 706, in __iter__
yield self.result(item)
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 780, in result
p = pywikibot.Page(self.site, pagedata['title'], pagedata['ns'])
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 77, in __init__
self._link = Link(title, source=source, defaultNamespace=ns)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 2958, in __init__
raise pywikibot.Error("Title contains illegal char (\\uFFFD)")
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
---
** [bugs:#1658] “Title contains illegal char (\\uFFFD)” with existing page**
**Status:** open
**Labels:** character encoding
**Created:** Sun Aug 25, 2013 08:00 AM UTC by Adrián Chaves Fernández
**Last Updated:** Sun Aug 25, 2013 08:00 AM UTC
**Owner:** nobody
This is happening with the following existing page: http://techbase.kde.org/Localization/fy/Fryske_kompjûterwurden
Traceback (most recent call last):
File "maintenance.py", line 81, in <module>
main()
File "maintenance.py", line 77, in main
bot.run()
File "/home/gallaecio/fontes/rodela/scripts/replace.py", line 326, in run
for page in self.generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 799, in PreloadingGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/pagegenerators.py", line 749, in DuplicateFilterPageGenerator
for page in generator:
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 706, in __iter__
yield self.result(item)
File "/home/gallaecio/fontes/rodela/pywikibot/data/api.py", line 780, in result
p = pywikibot.Page(self.site, pagedata['title'], pagedata['ns'])
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/__init__.py", line 249, in wrapper
return method(*__args, **__kw)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 77, in __init__
self._link = Link(title, source=source, defaultNamespace=ns)
File "/home/gallaecio/fontes/rodela/pywikibot/page.py", line 2958, in __init__
raise pywikibot.Error("Title contains illegal char (\\uFFFD)")
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
---
** [patches:#626] Relative links in Wikisource main namespace**
**Status:** open
**Created:** Sat Aug 24, 2013 10:20 PM UTC by AndreasJS
**Last Updated:** Sat Aug 24, 2013 10:20 PM UTC
**Owner:** nobody
Wikisource main namespace allows subpages and therefore relative links.
Add the line:
self.namespacesWithSubpage.extend([0])
See attachment
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/patches/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/patches/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
---
** [patches:#625] Pagegenerator: follow redirects, intersection, exclusion**
**Status:** open
**Created:** Sat Aug 24, 2013 09:57 PM UTC by AndreasJS
**Last Updated:** Sat Aug 24, 2013 09:57 PM UTC
**Owner:** nobody
I added three new arguments:
-followredirects
Used with other arguments that specify a set of pages.
If a specified page is a redirect page, work on its
target page.
-intersecting
Argument to be used between two other arguments.
Work only on pages normally specified by both the
previous and the next argument.
-excluding
Argument to be used between two other arguments.
Work only on pages normally specified by the
previous argument but not by the next argument.
For example, one could want to find the pages edited by a specific user that contain a certain keyword in a title.
A few other suggestions:
Exclude sections, even on files.
Compare pages via the Page.\_\_cmp\_\_ property to exclude duplicate pages instead of
u"%s:%s:%s" % (page._site.family.name, page._site.lang, page._title).
(more transparent and easier to maintain).
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/patches/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/patches/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
I upload a somewhat more concise version. I started this on my own before I found out about u/nu11zer0's original post, although I adopted some of their ideas. Note that a relative link always starts with '/' or a series of '../', everything else is a legitimate absolute link or not a link.
Attachment: wikipedia.py.diff (2.1 kB; application/octet-stream)
---
** [patches:#589] Convert relative links to absolute links in linkedPages().**
**Status:** open-fixed
**Created:** Sat Jan 26, 2013 05:33 PM UTC by Nullzer0
**Last Updated:** Sun Jan 27, 2013 01:53 PM UTC
**Owner:** xqt
Convert relative links to absolute links in linkedPages\(\).
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/patches/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/patches/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
merged
---
** [bugs:#1653] Pagegenerator WikidataItemGenerator does not work with RefferingPageGenerator**
**Status:** closed-fixed
**Created:** Sat Aug 10, 2013 11:36 PM UTC by Sk!d
**Last Updated:** Sat Aug 24, 2013 07:48 AM UTC
**Owner:** Legoktm
There must be a bug in Pagegenrators this code does not work:
referredPage = pywikibot.page.PropertyPage(pywikibot.Site().data_repository(), "Property:P21")
pagegenrator = pywikibot.pagegenerators.WikidataItemGenerator(pywikibot.pagegenerators.ReferringPageGenerator(referredPage, withTemplateInclusion=False, content=False))
if you iterate over pagegenerator you get the stacktrace:
File "X\core\wikidatascripts\itemfix.py", line 34, in <module>
if int(item.title()[1:]) <176:
File "X\core\pywikibot\page.py", line 2264, in title
self._link._text = self.getID()
File "X\core\pywikibot\page.py", line 2357, in getID
self.get(force=force)
File "X\core\pywikibot\page.py", line 2486, in get
super(ItemPage, self).get(force=force, *args)
File "X\core\pywikibot\page.py", line 2317, in get
data = self.repo.loadcontent(self.__defined_by(), *args)
File "X\core\pywikibot\site.py", line 3373, in loadcontent
data = req.submit()
File "X\core\pywikibot\data\api.py", line 393, in submit
raise APIError(code, info, **result["error"])
pywikibot.data.api.APIError: param-missing: Either provide the item "ids" or pairs of "sites" and "titles" for corresponding pages
a work around is by iterating over referredPage.getReferences()
---
Sent from sourceforge.net because Pywikipedia-bugs(a)lists.wikimedia.org is subscribed to https://sourceforge.net/p/pywikipediabot/bugs/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/pywikipediabot/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.