Pywikipedia-l December 2007

pywikipedia-l@lists.wikimedia.org

26 participants
320 discussions

[ pywikipediabot-Patches-1843787 ] catlib _getContentsAndSupercats performance issue

by SourceForge.net

Patches item #1843787, was opened at 2007-12-04 03:18 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1843787&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Pietro Battiston (toobaz) Assigned to: Nobody/Anonymous (nobody) Summary: catlib _getContentsAndSupercats performance issue Initial Comment: catlib.py's _getContentsAndSupercats method has a performance issue that in some cases can slow a lot the process of recursiverly downloading all pages or subcategories of a category. See this example (chosen just because it's short to report, not because it's so pathological): ###########ipython output############### In [1]: import catlib Checked for running processes. 1 processes currently running, including the current process. In [2]: len(catlib.Category('it', 'Categoria:Geometria descrittiva').articlesList(recurse=True)) Getting [[Categoria:Geometria descrittiva]]... Getting [[Categoria:Coperture a volta]]... Getting [[Categoria:Corrispondenza biunivoca (geometria descrittiva)]]... Getting [[Categoria:Curve piane]]... Getting [[Categoria:Curve tridimensionali]]... Getting [[Categoria:Glossario (geometria descrittiva)]]... Getting [[Categoria:Metodi di rappresentazione]]... Getting [[Categoria:Modellazione geometrica]]... Getting [[Categoria:Tassellazioni]]... Getting [[Categoria:Poliedri]]... Getting [[Categoria:Tassellazioni]]... Getting [[Categoria:Problemi di misura]]... Getting [[Categoria:Stub geometria descrittiva]]... Getting [[Categoria:Superfici]]... Getting [[Categoria:Sviluppo di solidi]]... Getting [[Categoria:Tangenza]]... Out[2]: 393 ###########end ipython output############### As you can see, [[Categoria:Tassellazioni]] is downloaded 2 times. But I can grant you that there are a lot of much worse cases. Anyway, I'm attaching a patch. After the patch, here are the same commands: ###########ipython output############### In [1]: import catlib Checked for running processes. 1 processes currently running, including the current process. In [2]: len(catlib.Category('it', 'Categoria:Geometria descrittiva').articlesList(recurse=True)) Getting [[Categoria:Geometria descrittiva]]... Getting [[Categoria:Coperture a volta]]... Getting [[Categoria:Corrispondenza biunivoca (geometria descrittiva)]]... Getting [[Categoria:Curve piane]]... Getting [[Categoria:Curve tridimensionali]]... Getting [[Categoria:Glossario (geometria descrittiva)]]... Getting [[Categoria:Metodi di rappresentazione]]... Getting [[Categoria:Modellazione geometrica]]... Getting [[Categoria:Tassellazioni]]... Getting [[Categoria:Poliedri]]... Getting [[Categoria:Problemi di misura]]... Getting [[Categoria:Stub geometria descrittiva]]... Getting [[Categoria:Superfici]]... Getting [[Categoria:Sviluppo di solidi]]... Getting [[Categoria:Tangenza]]... Out[2]: 393 ###########end ipython output############### Notice this patch also solves the problem of eventual loops in categories: catlib won't loop. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=1843787&group_…

16 years, 5 months

[ pywikipediabot-Bugs-1842905 ] [patch] catlib _getContentsAndSupercats performance issue

by SourceForge.net

Bugs item #1842905, was opened at 2007-12-02 21:39 Message generated for change (Settings changed) made by toobaz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1842905&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: category Group: None >Status: Deleted Resolution: None Priority: 5 Private: No Submitted By: Pietro Battiston (toobaz) Assigned to: Nobody/Anonymous (nobody) Summary: [patch] catlib _getContentsAndSupercats performance issue Initial Comment: catlib.py's _getContentsAndSupercats method has a performance issue that in some cases can slow a lot the process of recursiverly downloading all pages or subcategories of a category. See this example (chosen just because it's short to report, not because it's so pathological): ###########ipython output############### In [1]: import catlib Checked for running processes. 1 processes currently running, including the current process. In [2]: len(catlib.Category('it', 'Categoria:Geometria descrittiva').articlesList(recurse=True)) Getting [[Categoria:Geometria descrittiva]]... Getting [[Categoria:Coperture a volta]]... Getting [[Categoria:Corrispondenza biunivoca (geometria descrittiva)]]... Getting [[Categoria:Curve piane]]... Getting [[Categoria:Curve tridimensionali]]... Getting [[Categoria:Glossario (geometria descrittiva)]]... Getting [[Categoria:Metodi di rappresentazione]]... Getting [[Categoria:Modellazione geometrica]]... Getting [[Categoria:Tassellazioni]]... Getting [[Categoria:Poliedri]]... Getting [[Categoria:Tassellazioni]]... Getting [[Categoria:Problemi di misura]]... Getting [[Categoria:Stub geometria descrittiva]]... Getting [[Categoria:Superfici]]... Getting [[Categoria:Sviluppo di solidi]]... Getting [[Categoria:Tangenza]]... Out[2]: 393 ###########end ipython output############### As you can see, [[Categoria:Tassellazioni]] is downloaded 2 times. But I can grant you that there are a lot of much worse cases. Anyway, I'm attaching a patch. After the patch, here are the same commands: ###########ipython output############### In [1]: import catlib Checked for running processes. 1 processes currently running, including the current process. In [2]: len(catlib.Category('it', 'Categoria:Geometria descrittiva').articlesList(recurse=True)) Getting [[Categoria:Geometria descrittiva]]... Getting [[Categoria:Coperture a volta]]... Getting [[Categoria:Corrispondenza biunivoca (geometria descrittiva)]]... Getting [[Categoria:Curve piane]]... Getting [[Categoria:Curve tridimensionali]]... Getting [[Categoria:Glossario (geometria descrittiva)]]... Getting [[Categoria:Metodi di rappresentazione]]... Getting [[Categoria:Modellazione geometrica]]... Getting [[Categoria:Tassellazioni]]... Getting [[Categoria:Poliedri]]... Getting [[Categoria:Problemi di misura]]... Getting [[Categoria:Stub geometria descrittiva]]... Getting [[Categoria:Superfici]]... Getting [[Categoria:Sviluppo di solidi]]... Getting [[Categoria:Tangenza]]... Out[2]: 393 ###########end ipython output############### Notice this patch also solves the problem of eventual loops in categories: catlib won't loop. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1842905&group_…

16 years, 5 months

SVN: [4637] trunk/pywikipedia/casechecker.py

by yurik＠svn.wikimedia.org

Revision: 4637 Author: yurik Date: 2007-12-04 01:58:58 +0000 (Tue, 04 Dec 2007) Log Message: ----------- casechecker bug fix Modified Paths: -------------- trunk/pywikipedia/casechecker.py Modified: trunk/pywikipedia/casechecker.py =================================================================== --- trunk/pywikipedia/casechecker.py 2007-12-03 22:06:35 UTC (rev 4636) +++ trunk/pywikipedia/casechecker.py 2007-12-04 01:58:58 UTC (rev 4637) @@ -218,6 +218,7 @@ wikipedia.output(u'Whitelist: [[%s]]' % u']], [['.join(self.knownWords)) else: wikipedia.output(u'Whitelist is not known for language %s' % self.site.lang) + self.knownWords = set() def Run(self): try:

16 years, 5 months

[ pywikipediabot-Bugs-1843759 ] [patch] image.py doesn't work

by SourceForge.net

Bugs item #1843759, was opened at 2007-12-04 01:38 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1843759&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: other Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Pietro Battiston (toobaz) Assigned to: Nobody/Anonymous (nobody) Summary: [patch] image.py doesn't work Initial Comment: The following command: python image.py pippo.png gives the following output: Checked for running processes. 1 processes currently running, including the current process. 'Page' object has no attribute 'usingPages' The problem is that oldImagePage is an instance of wikipedia.Page instead of wikipedia.ImagePage. I attach a very simple patch that fixes it. Then, everything works. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1843759&group_…

16 years, 5 months

SVN: [4636] trunk/pywikipedia/interwiki.py

by siebrand＠svn.wikimedia.org

Revision: 4636 Author: siebrand Date: 2007-12-03 22:06:35 +0000 (Mon, 03 Dec 2007) Log Message: ----------- Fix in 'nov' translation. Modified Paths: -------------- trunk/pywikipedia/interwiki.py Modified: trunk/pywikipedia/interwiki.py =================================================================== --- trunk/pywikipedia/interwiki.py 2007-12-03 17:09:51 UTC (rev 4635) +++ trunk/pywikipedia/interwiki.py 2007-12-03 22:06:35 UTC (rev 4636) @@ -315,7 +315,7 @@ 'nl': (u'robot ', u'Erbij', u'Eraf', u'Anders'), 'nn': (u'robot ', u'la til', u'fjerna', u'endra'), 'no': (u'robot ', u'legger til', u'fjerner', u'endrer'), - 'nov': (u'robote ', u'Adad', u'Ekartad', u'Modifikad'), + 'nov': (u'robote ', u'Adid', u'Ekartad', u'Modifikad'), 'os': (u'Робот', u'баххæст кодта', u'Баивта', u'Аиуварс'), 'pl': (u'robot ', u'dodaje', u'usuwa', u'poprawia'), 'pms': (u'ël trigomiro ', u'a gionta', u'a modìfica', u'a gava'),

16 years, 5 months

SVN: [4635] trunk/pywikipedia/blockpageschecker.py

by filnik＠svn.wikimedia.org

Revision: 4635 Author: filnik Date: 2007-12-03 17:09:51 +0000 (Mon, 03 Dec 2007) Log Message: ----------- Adding an except block to prevent server crash Modified Paths: -------------- trunk/pywikipedia/blockpageschecker.py Modified: trunk/pywikipedia/blockpageschecker.py =================================================================== --- trunk/pywikipedia/blockpageschecker.py 2007-12-03 16:18:14 UTC (rev 4634) +++ trunk/pywikipedia/blockpageschecker.py 2007-12-03 17:09:51 UTC (rev 4635) @@ -55,6 +55,7 @@ always = False generator = False genFactory = pagegenerators.GeneratorFactory() + errorCount = 0 # Loading the default options. for arg in wikipedia.handleArgs(): if arg == '-always': @@ -103,28 +104,40 @@ wikipedia.output(u"\n\n>>> \03{lightpurple}%s\03{default} <<<" % page.title()) wikipedia.showDiff(oldtext, text) choice = '' - if not always: - choice = wikipedia.inputChoice(u'Do you want to accept these changes?', ['Yes', 'No', 'All'], ['y', 'N', 'a'], 'N') - if choice.lower() in ['a', 'all']: - always = True - if choice.lower() in ['n', 'no']: - break - if choice.lower() in ['y', 'yes'] or always: - try: - page.put(text, commentUsed) - except wikipedia.EditConflict: - wikipedia.output(u'Edit conflict! skip!') - continue - except wikipedia.SpamfilterError, e: - wikipedia.output(u'Cannot change %s because of blacklist entry %s' % (page.title(), e.url)) - continue - except wikipedia.PageNotSaved, error: - wikipedia.output(u'Error putting page: %s' % (error.args,)) - continue - except wikipedia.LockedPage: - wikipedia.output(u'The page is still protected. Skipping...') - continue - + while 1: + if not always: + choice = wikipedia.inputChoice(u'Do you want to accept these changes?', ['Yes', 'No', 'All'], ['y', 'N', 'a'], 'N') + if choice.lower() in ['a', 'all']: + always = True + if choice.lower() in ['n', 'no']: + break + if choice.lower() in ['y', 'yes'] or always: + try: + page.put(text, commentUsed) + except wikipedia.EditConflict: + wikipedia.output(u'Edit conflict! skip!') + break + except wikipedia.ServerError: + errorCount += 1 + if errorCount < 5: + wikipedia.output(u'Server Error! Wait..') + time.sleep(3) + continue + else: + raise wikipedia.ServerError(u'Fifth Server Error!') + except wikipedia.SpamfilterError, e: + wikipedia.output(u'Cannot change %s because of blacklist entry %s' % (page.title(), e.url)) + break + except wikipedia.PageNotSaved, error: + wikipedia.output(u'Error putting page: %s' % (error.args,)) + break + except wikipedia.LockedPage: + wikipedia.output(u'The page is still protected. Skipping...') + break + else: + # Break only if the errors are one after the other... + errorCount = 0 + break if __name__ == "__main__": try: main()

16 years, 5 months

SVN: [4634] trunk/pywikipedia/blockpageschecker.py

by filnik＠svn.wikimedia.org

Revision: 4634 Author: filnik Date: 2007-12-03 16:18:14 +0000 (Mon, 03 Dec 2007) Log Message: ----------- Adding a new script to check blocked pages. Added Paths: ----------- trunk/pywikipedia/blockpageschecker.py Added: trunk/pywikipedia/blockpageschecker.py =================================================================== --- trunk/pywikipedia/blockpageschecker.py (rev 0) +++ trunk/pywikipedia/blockpageschecker.py 2007-12-03 16:18:14 UTC (rev 4634) @@ -0,0 +1,132 @@ +# -*- coding: utf-8 -*- +""" +This is a script originally written by Wikihermit and then rewritten by Filnik, to delete the templates used to warn in the +pages that a page is blocked, when the page isn't blocked at all. Indeed, very often sysops block the pages for a setted +time but then the forget to delete the warning! This script is useful if you want to delete those useless warning left in these +pages. + +Parameters: + +-always Doesn't ask every time if the bot should make the change or not, do it always. +-page Work only on one page + +Note: This script uses also genfactory, you can use these generator as default. + +""" +# +# (C) Wikihermit, 2007 +# (C) Filnik, 2007 +# +# Distributed under the terms of the MIT license. +# +__version__ = '$Id: $' +# + +import re +import wikipedia, catlib, pagegenerators + +# Use only regex! +#fr regexes added by Darkoneko 09 oct 07, THEY ARE UNTESTED at the moment, please check ! +templateToRemove = { + 'en':[r'\{\{(?:[Tt]emplate:|)[Pp]p-protected\}\}', r'{\{([Tt]emplate:|)[Pp]p-dispute\}\}', + r'{\{(?:[Tt]emplate:|)[Pp]p-template\}\}', r'{\{([Tt]emplate:|)[Pp]p-usertalk\}\}'], + 'fr':[r'\{\{(?:[Tt]emplate:|[Mm]odèle:|)[Pp]rotection(|[^\}]*)\}\}', + r'\{\{(?:[Tt]emplate:|[Mm]odèle:|)(?:[Pp]age|[Aa]rchive|[Mm]odèle) protégée?\}\}', + r'\{\{(?:[Tt]emplate:|[Mm]odèle:|)[Ss]emi[- ]?protection\}\}' + ], + 'it':[r'{\{(?:[Tt]emplate:|)[Aa]vvisobloccoparziale(?:|[ _]scad\|(.*?))\}\}', r'{\{(?:[Tt]emplate:|)[Aa]vvisoblocco(?:|[ _]scad\|(?:.*?))\}\}'], + } +categoryToCheck = { + 'en':[u'Category:Protected'], + 'fr':[u'Category:Page semi-protégée', u'Category:Page protégée'], + 'it':[u'Categoria:Pagine semiprotette', u'Categoria:Voci_protette'], + } + +comment = { + 'en':u'Bot: Deleting out-dated template', + 'fr':u'Robot : Retrait du bandeau protection/semi-protection d\'une page qui ne l\'es plus', + 'it':u'Bot: Tolgo template di avviso blocco scaduto', + } + +def main(): + global templateToRemove + global categoryToCheck + global comment + always = False + generator = False + genFactory = pagegenerators.GeneratorFactory() + # Loading the default options. + for arg in wikipedia.handleArgs(): + if arg == '-always': + always = True + elif arg.startswith('-page'): + if len(arg) == 5: + generator = [wikipedia.Page(wikipedia.getSite(), wikipedia.input(u'What page do you want to use?'))] + else: + generator = [wikipedia.Page(wikipedia.getSite(), arg[6:])] + else: + generator = genFactory.handleArg(arg) + # Load the right site + site = wikipedia.getSite() + # Take the right templates to use, the category and the comment + TTR = wikipedia.translate(site, templateToRemove) + category = wikipedia.translate(site, categoryToCheck) + commentUsed = wikipedia.translate(site, comment) + # Define the category + if not generator: + for CAT in category: + cat = catlib.Category(site, CAT) + # Define the generator + generator = pagegenerators.CategorizedPageGenerator(cat) + for page in generator: + pagename = page.title() + wikipedia.output('Loading %s...' % pagename) + try: + (text, useless, editRestriction) = page._getEditPage() + except wikipedia.NoPage: + wikipedia.output("%s doesn't exist! Skipping..." % pagename) + except wikipedia.IsRedirectPage: + wikipedia.output("%s is a redirect! Skipping..." % pagename) + if editRestriction == 'sysop': + wikipedia.output(u'The page is protected to the sysop, skipping...') + continue + elif editRestriction == 'autoconfirmed': + wikipedia.output(u'The page is editable only for the autoconfirmed users, skipping...') + continue + else: + wikipedia.output(u'The page is editable for all, deleting the template...') + # Only to see if the text is the same or not... + oldtext = text + for replaceToPerform in TTR: + text = re.sub(replaceToPerform, '', text) + if oldtext != text: + wikipedia.output(u"\n\n>>> \03{lightpurple}%s\03{default} <<<" % page.title()) + wikipedia.showDiff(oldtext, text) + choice = '' + if not always: + choice = wikipedia.inputChoice(u'Do you want to accept these changes?', ['Yes', 'No', 'All'], ['y', 'N', 'a'], 'N') + if choice.lower() in ['a', 'all']: + always = True + if choice.lower() in ['n', 'no']: + break + if choice.lower() in ['y', 'yes'] or always: + try: + page.put(text, commentUsed) + except wikipedia.EditConflict: + wikipedia.output(u'Edit conflict! skip!') + continue + except wikipedia.SpamfilterError, e: + wikipedia.output(u'Cannot change %s because of blacklist entry %s' % (page.title(), e.url)) + continue + except wikipedia.PageNotSaved, error: + wikipedia.output(u'Error putting page: %s' % (error.args,)) + continue + except wikipedia.LockedPage: + wikipedia.output(u'The page is still protected. Skipping...') + continue + +if __name__ == "__main__": + try: + main() + finally: + wikipedia.stopme()

16 years, 5 months

SVN: [4633] trunk/pywikipedia/add_text.py

by filnik＠svn.wikimedia.org

Revision: 4633 Author: filnik Date: 2007-12-03 16:03:29 +0000 (Mon, 03 Dec 2007) Log Message: ----------- -always had a little bug (choice doesn't exist if used) Modified Paths: -------------- trunk/pywikipedia/add_text.py Modified: trunk/pywikipedia/add_text.py =================================================================== --- trunk/pywikipedia/add_text.py 2007-12-03 15:58:01 UTC (rev 4632) +++ trunk/pywikipedia/add_text.py 2007-12-03 16:03:29 UTC (rev 4633) @@ -251,6 +251,7 @@ newtext = addText + '\n' + text wikipedia.output(u"\n\n>>> \03{lightpurple}%s\03{default} <<<" % page.title()) wikipedia.showDiff(text, newtext) + choice = '' while 1: if not always: choice = wikipedia.inputChoice(u'Do you want to accept these changes?', ['Yes', 'No', 'All'], ['y', 'N', 'a'], 'N')

16 years, 5 months

SVN: [4632] trunk/pywikipedia/add_text.py

by filnik＠svn.wikimedia.org

Revision: 4632 Author: filnik Date: 2007-12-03 15:58:01 +0000 (Mon, 03 Dec 2007) Log Message: ----------- another bugfix Modified Paths: -------------- trunk/pywikipedia/add_text.py Modified: trunk/pywikipedia/add_text.py =================================================================== --- trunk/pywikipedia/add_text.py 2007-12-03 15:51:29 UTC (rev 4631) +++ trunk/pywikipedia/add_text.py 2007-12-03 15:58:01 UTC (rev 4632) @@ -146,9 +146,9 @@ summary = arg[9:] elif arg.startswith('-page'): if len(arg) == 5: - generator = list(wikipedia.input(u'What page do you want to use?')) + generator = [wikipedia.Page(wikipedia.getSite(), wikipedia.input(u'What page do you want to use?'))] else: - generator = listr(arg[6:]) + generator = [wikipedia.Page(wikipedia.getSite(), arg[6:])] elif arg.startswith('-excepturl'): exceptUrl = True if len(arg) == 10: @@ -289,288 +289,4 @@ try: main() finally: - wikipedia.stopme()#!/usr/bin/python -# -*- coding: utf-8 -*- -""" -This is a Bot written by Filnik to add a text in a given category. - ---- GenFactory Generator is used --- --start Define from which page should the Bot start --ref Use the ref as generator --cat Use a category as generator --filelinks Use all the links to an image as generator --unusedfiles --unwatched --withoutinterwiki --interwiki --file --uncatfiles --uncatcat --uncat --subcat --transcludes Use all the page that transclude a certain page as generator --weblink Use the pages with a certain web link as generator --links Use the links from a certain page as generator --regex Only work on pages whose titles match the given regex - ---- Other parameters --- --page Use a page as generator --text Define which text add --summary Define the summary to use --except Use a regex to understand if the template is already in the page --excepturl Use the html page as text where you want to see if there's the text, not the wiki-page. --newimages Add text in the new images --untagged Add text in the images that doesn't have any license template --always If used, the bot won't asked if it should add the text specified -""" - -# -# (C) Filnik, 2007 -# -# Distributed under the terms of the MIT license. -# -__version__ = '$Id: AddText.py,v 1.0 2007/11/27 17:08:30 filnik Exp$' -# - -import re, pagegenerators, urllib2, urllib -import wikipedia, catlib - -class NoEnoughData(wikipedia.Error): - """ Error class for when the user doesn't specified all the data needed """ - -class NothingFound(wikipedia.Error): - """ An exception indicating that a regex has return [] instead of results.""" - -def pageText(url): - try: - request = urllib2.Request(url) - user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7' - request.add_header("User-Agent", user_agent) - response = urllib2.urlopen(request) - text = response.read() - response.close() - # When you load to many users, urllib2 can give this error. - except urllib2.HTTPError: - wikipedia.output(u"Server error. Pausing for 10 seconds... " + time.strftime("%d %b %Y %H:%M:%S (UTC)", time.gmtime()) ) - time.sleep(10) - request = urllib2.Request(url) - user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7' - request.add_header("User-Agent", user_agent) - response = urllib2.urlopen(request) - text = response.read() - response.close() - return text - -def untaggedGenerator(untaggedProject, limit = 500): - lang = untaggedProject.split('.', 1)[0] - project = '.' + untaggedProject.split('.', 1)[1] - if lang == 'commons': - link = 'http://tools.wikimedia.de/~daniel/WikiSense/UntaggedImages.php?wikifam=comm…' - else: - link = 'http://tools.wikimedia.de/~daniel/WikiSense/UntaggedImages.php?wikilang=' + lang + '&wikifam=' + project + '&order=img_timestamp&max=' + str(limit) + '&ofs=0&max=' + str(limit) - text = pageText(link) - #print text - regexp = r"""<td valign='top' title='Name'><a href='http://.*?\..*?\.org/w/index\.php\?title=(.*?)'>.*?</a></td>""" - results = re.findall(regexp, text) - if results == []: - print link - raise NothingFound('Nothing found! Try to use the tool by yourself to be sure that it works!') - else: - for result in results: - yield wikipedia.Page(self.site, result) - -def newImages(limit): - # Search regular expression to find links like this (and the class attribute is optional too) - # class="new" title="Immagine:Soldatino2.jpg">Immagine:Soldatino2.jpg</a>" ‎ <span class="comment"> - url = "/w/index.php?title=Special:Log&type=upload&user=&page=&pattern=&limit=%d&offset=0" % int(limit) - site = wikipedia.getSite() - textrun = site.getUrl(url) - image_namespace = site.image_namespace() + ":" - regexp = r'(class=\"new\" |)title=\"' + image_namespace + '(.*?)\.(\w\w\w|jpeg)\">.*?</a>\".*?<span class=\"comment\">' - pos = 0 - done = list() - ext_list = list() - r = re.compile(regexp, re.UNICODE) - while 1: - m = r.search(textrun, pos) - if m == None: - wikipedia.output(u"\t\t>> All images checked. <<") - break - pos = m.end() - new = m.group(1) - im = m.group(2) - ext = m.group(3) - # This prevent pages with strange characters. They will be loaded without problem. - image = im + "." + ext - if new != '': - wikipedia.output(u"Skipping %s because it has been deleted." % image) - done.append(image) - if image not in done: - done.append(image) - yield wikipedia.Page(site, 'Image:%s' % image) - -def main(): - starsList = ['link[ _]fa', 'link[ _]adq', 'enllaç[ _]ad', - 'link[ _]ua', 'legătură[ _]af', 'destacado', - 'ua', 'liên k[ _]t[ _]chọn[ _]lọc'] - summary = None - addText = None - regexSkip = None - generator = None - always = False - exceptUrl = False - genFactory = pagegenerators.GeneratorFactory() - errorCount = 0 - - for arg in wikipedia.handleArgs(): - if arg.startswith('-text'): - if len(arg) == 5: - addText = wikipedia.input(u'What text do you want to add?') - else: - addText = arg[6:] - elif arg.startswith('-summary'): - if len(arg) == 8: - summary = wikipedia.input(u'What summary do you want to use?') - else: - summary = arg[9:] - elif arg.startswith('-page'): - if len(arg) == 5: - generator = list(wikipedia.input(u'What page do you want to use?')) - else: - generator = list(arg[6:]) - elif arg.startswith('-excepturl'): - exceptUrl = True - if len(arg) == 10: - regexSkip = wikipedia.input(u'What text should I skip?') - else: - regexSkip = arg[11:] - elif arg.startswith('-except'): - if len(arg) == 7: - regexSkip = wikipedia.input(u'What text should I skip?') - else: - regexSkip = arg[8:] - elif arg.startswith('-untagged'): - if len(arg) == 9: - untaggedProject = wikipedia.input(u'What project do you want to use?') - else: - untaggedProject = arg[10:] - generator = untaggedGenerator(untaggedProject) - elif arg.startswith('-newimages'): - if len(arg) == 10: - limit = wikipedia.input(u'How many images do you want to check?') - else: - limit = arg[11:] - generator = newImages(limit) - elif arg == '-always': - always = True - else: - generator = genFactory.handleArg(arg) - - site = wikipedia.getSite() - pathWiki = site.family.nicepath(site.lang) - if not generator: - raise NoEnoughData('You have to specify the generator you want to use for the script!') - if not addText: - raise NoEnoughData('You have to specify what text you want to add!') - if not summary: - summary = 'Bot: Adding %s' % addText - for page in generator: - wikipedia.output(u'Loading %s...' % page.title()) - try: - text = page.get() - except wikipedia.NoPage: - wikipedia.output(u"%s doesn't exist, skip!" % page.title()) - continue - except wikipedia.IsRedirectPage: - wikipedia.output(u"%s is a redirect, skip!" % page.title()) - continue - if regexSkip and exceptUrl: - url = '%s%s' % (pathWiki, page.urlname()) - result = re.findall(regexSkip, site.getUrl(url)) - elif regexSkip: - result = re.findall(regexSkip, text) - else: - result = [] - if result != []: - wikipedia.output(u'Exception! regex (or word) use with -except, is in the page. Skip!') - continue - newtext = text - categoryNamespace = site.namespace(14) - regexpCat = re.compile(r'\[\[((?:category|%s):.*?)\]\]' % categoryNamespace.lower(), re.I) - categorieInside = regexpCat.findall(text) - newtext = wikipedia.removeCategoryLinks(newtext, site) - interwikiInside = page.interwiki() - interwikiList = list() - for paginetta in interwikiInside: - nome = str(paginetta).split('[[')[1].split(']]')[0] - interwikiList.append(nome) - lang = nome.split(':')[0] - newtext = wikipedia.removeLanguageLinks(newtext, site) - interwikiList.sort() - newtext += "\n%s" % addText - for paginetta in categorieInside: - try: - newtext += '\n[[%s]]' % paginetta.decode('utf-8') - except UnicodeEncodeError: - try: - newtext += '\n[[%s]]' % paginetta.decode('Latin-1') - except UnicodeEncodeError: - newtext += '\n[[%s]]' % paginetta - newtext += '\n' - starsListInPage = list() - for star in starsList: - regex = re.compile('(\{\{(?:template:|)%s\|.*?\}\}\n)' % star, re.I) - risultato = regex.findall(newtext) - if risultato != []: - newtext = regex.sub('', newtext) - for element in risultato: - newtext += '\n%s' % element - for paginetta in interwikiList: - try: - newtext += '\n[[%s]]' % paginetta.decode('utf-8') - except UnicodeEncodeError: - try: - newtext += '\n[[%s]]' % paginetta.decode('Latin-1') - except UnicodeEncodeError: - newtext += '\n[[%s]]' % paginetta - wikipedia.output(u"\n\n>>> \03{lightpurple}%s\03{default} <<<" % page.title()) - wikipedia.showDiff(text, newtext) - while 1: - if not always: - choice = wikipedia.inputChoice(u'Do you want to accept these changes?', ['Yes', 'No', 'All'], ['y', 'N', 'a'], 'N') - if choice.lower() in ['a', 'all']: - always = True - if choice.lower() in ['n', 'no']: - break - if choice.lower() in ['y', 'yes'] or always: - try: - page.put(newtext, summary) - except wikipedia.EditConflict: - wikipedia.output(u'Edit conflict! skip!') - break - except wikipedia.ServerError: - errorCount += 1 - if errorCount < 5: - wikipedia.output(u'Server Error! Wait..') - time.sleep(3) - continue - else: - raise wikipedia.ServerError(u'Fifth Server Error!') - except wikipedia.SpamfilterError, e: - wikipedia.output(u'Cannot change %s because of blacklist entry %s' % (page.title(), e.url)) - break - except wikipedia.PageNotSaved, error: - wikipedia.output(u'Error putting page: %s' % (error.args,)) - break - except wikipedia.LockedPage: - wikipedia.output(u'Skipping %s (locked page)' % (page.title(),)) - break - else: - # Break only if the errors are one after the other... - errorCount = 0 - break -if __name__ == "__main__": - try: - main() - finally: wikipedia.stopme()

16 years, 5 months

SVN: [4631] trunk/pywikipedia/add_text.py

by filnik＠svn.wikimedia.org

Revision: 4631 Author: filnik Date: 2007-12-03 15:51:29 +0000 (Mon, 03 Dec 2007) Log Message: ----------- little bug found, bugfix Modified Paths: -------------- trunk/pywikipedia/add_text.py Modified: trunk/pywikipedia/add_text.py =================================================================== --- trunk/pywikipedia/add_text.py 2007-12-03 14:59:19 UTC (rev 4630) +++ trunk/pywikipedia/add_text.py 2007-12-03 15:51:29 UTC (rev 4631) @@ -437,7 +437,7 @@ if len(arg) == 5: generator = list(wikipedia.input(u'What page do you want to use?')) else: - generator = listr(arg[6:]) + generator = list(arg[6:]) elif arg.startswith('-excepturl'): exceptUrl = True if len(arg) == 10:

16 years, 5 months

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Pywikipedia-l December 2007