I'm using replace.py to create wikilinks. Usually I want to select only the
first occurrence of the search string, and my command works fine for this.
But sometimes, the first hit is not suitable (e.g. it's part of a book or
course title, so I don't want to add the wikilink). If I choose n for no,
the bot goes to the next page.
Is there a way I can skip to the next occurrence in the same page? I'm
guessing it will need a modified version of replace.py, so that it gives an
extra option besides ([y]es, [N]o, [e]dit, open in [b]rowser, [a]ll,
[q]uit)
The actual command I'm using is:
python replace.py -regex "(?si)\b((?:FOO1|FOO2))\b(.*$)
" "[[\\1]]\\2" -exceptinsidetag:link -exceptinsidetag:hyperlink
-exceptinsidetag:header -exceptinsidetag:nowiki -exceptinsidetag:ref
-excepttext:"(?si)\[\[((?:FOO1|FOO2)[\|\]])" -namespace:0 -namespace:102
-namespace:4 -summary:"[[Appropedia:Wikilink bot]] adding double square
brackets to: FOO1|FOO2." -log -xml:currentdump.xml
Many thanks!
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.org
identi.ca/appropedia
twitter.com/appropedia
Hi everyone,
I'm getting so freaking tired of these untested sloppy commits which
keep breaking the bots.
alexsh(a)svn.wikimedia.org schreef:
> Revision: 7591
> Author: alexsh
> Date: 2009-11-04 13:22:17 +0000 (Wed, 04 Nov 2009)
>
> Log Message:
> -----------
> * site().getUrl(): change all HTTP process to use urllib2.
> * handle and combine Site Authentication, proxy handle and Proxy Authentication in the bottom.
>
> Modified Paths:
> --------------
> trunk/pywikipedia/wikipedia.py
>
>
>
<knip>
>
> -class MyURLopener(urllib.FancyURLopener):
> - version="PythonWikipediaBot/1.0"/pywikipedia-svn
>
How the hell is upload.py supposed to work now?
Traceback (most recent call last):
File "D:\Wikipedia\pywikipedia\flickrripper_wlanl.py", line 562, in
<module>
main()
File "D:\Wikipedia\pywikipedia\flickrripper_wlanl.py", line 551, in main
uploadedPhotos += processPhoto(flickr, photo_id, flickrreview,
reviewer, ove
rride, addCategory, removeCategories, autonomous)
File "D:\Wikipedia\pywikipedia\flickrripper_wlanl.py", line 252, in
processPho
to
bot.upload_image(debug=False)
File "D:\Wikipedia\pywikipedia\upload.py", line 227, in upload_image
self.read_file_content()
File "D:\Wikipedia\pywikipedia\upload.py", line 108, in read_file_content
uo = wikipedia.MyURLopener()
AttributeError: OpenerDirector instance has no __call__ method
Please test your stuff decently before you commit it! About every time I
do an svn update I have to get rid of dozens of bugs
Maarten
Hi,
I have modified the get.py modul of the pywikibot. The purpose is to display
the date of creation of the article. I guessed from wikipedia.py that I need
either page.getVersionHistory(), which is a list, or
page.getVersionHistoryTable(), which gives the history as a wikitable.
Now it seems to work, but sometimes it gives back a shorter page history,
than the real one, and, unfortunately, sometimes both methods give an empty
page history, and thus en error in the program flow.
Here are some concerned pages:
[[hu:Ehrenfeld Samu]] (http://hu.wikipedia.org/wiki/Ehrenfeld_Saul) and
[[hu:TCSEC]] (http://hu.wikipedia.org/wiki/TCSEC) seems to have a completely
empty page history, which is not true and results in an error.
(Theoretically, no article could exist with an empty history.)
[[hu:Mark VII]] (http://hu.wikipedia.org/wiki/Mark_VII) displays only one
line, just as it had a single-edit history.
Where is the mistake?
I attach proba.py which is the modified version of get.py. I hope you will
not take my head for that, it's very small. :-) The Hungarian comments are
not relevant, the essence is written here. An article can be given as a
command line parameter.
Thanks!
--
Bináris
tried simple:
for t in pages:
p = wikipedia.Page(site, t)
p._moveOld("Wikipedysta:Joymaster/%s" % t, reason=action, sysop=True, leaveRedirect=False)
Traceback (most recent call last):
File "kalendariumwp.py", line 31, in <module>
p._moveOld("Wikipedysta:Joymaster/%s" % t, reason=action, sysop=True, leaveRedirect=False)
File "/home/admini/saper/wikipedia/pywikipedia/wikipedia.py", line 2772, in _moveOld
self.get(force=True, get_redirect=True, throttle=False)
File "/home/admini/saper/wikipedia/pywikipedia/wikipedia.py", line 684, in get
self._contents = self._getEditPage(get_redirect = get_redirect, throttle = throttle, sysop = sysop)
File "/home/admini/saper/wikipedia/pywikipedia/wikipedia.py", line 758, in _getEditPage
raise NoPage(self.site(), self.aslink(forceInterwiki = True),"Page does not exist. In rare cases, if you are certain the page does exist, look into overriding family.RversionTab" )
wikipedia.NoPage: (wikipedia:pl, u'[[pl:Kalendarium Wojska Polskiego 1919]]', 'Page does not exist. In rare cases, if you are certain the page does exist, look into overriding family.RversionTab')
There is obviously nothing to get, as nothing is left with the old name (not even a redirect).
--
<< Marcin Cieslak // saper(a)saper.info >>
Hi,
I've written a script that edits interwiki links according to directions
provided in an XML file.
Originally, I've written the script in order to apply changes
recommended by an interwiki analysis tool that I've written:
http://code.google.com/p/interwiki-analysis/
However, since the input file format is quite simple and generic, I
think other users might find the tool useful.
If you find the script interesting, feel free to add it to pywikipedia.
You can download it from here:
http://code.google.com/p/interwiki-analysis/source/browse/trunk/wikibot/int…
Regards,
Lukasz Bolikowski
Hi,
can I _exclude_ categories from replace.py? I mean it has
-cat/-catr/-subcats/-subcatsr for working within a category. But I want to
use it everywhere BUT in a given category end its subcats. Or could anyone
write this feature?
The task is: changing o/O letters following a digit to 0 (zero). This is a
frequent error. But I don't work in Chemistry and subcats, because there is
more often on purpose.
--
Bináris
I want to do a search and replace on FOO1, except where it occurs within the
text of an external link, e.g. [https://www.blah.bar/ blah FOO1 more text].
I was using -exceptinsidetag:hyperlink but eventually realized it only
excludes hits within the actual url.
I'm guessing I need to use exceptinside, but not sure how. Can someone give
me some guidance?
I'm already using -regex. The actual command I'm using is:
python replace.py -regex "(?si)\b((?:FOO1|FOO2))\b(.*$)" "[[\\1]]\\2"
-exceptinsidetag:link -exceptinsidetag:hyperlink -exceptinsidetag:header
-exceptinsidetag:nowiki -exceptinsidetag:ref
-excepttext:"(?si)\[\[((?:FOO1|FOO2)[\|\]])" -namespace:0 -namespace:102
-namespace:4 -summary:"[[Appropedia:Wikilink bot]] adding double square
brackets to: FOO1|FOO2." -log -xml:currentdump.xml
(Not that I understand regex much - I had lots of help getting there.)
Many thanks for any help!
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.org
identi.ca/appropedia
twitter.com/appropedia
I am trying to setup Pywikipedia (r7665 (wikipedia.py), 2009/11/17,
12:24:54)
The login.py script works.
Using the addtext.py script fails with the text below:
<addtext.py>
python-2.4.3 add_text.py -cat:Bot_Test -text:"another bot test" -down
Getting [[Category:Bot Test]]...
Loading Bot Test Page...
Error downloading data: No JSON object could be decoded
Request
en:/w/api.php?inprop=protection%7Ctalkid%7Csubjectid%7Curl%7Creadable&fo
rmat=json&rvprop=content%7Cids%7Cflags%7Ctimestamp%7Cuser%7Ccomment%7Csi
ze&prop=revisions%7Cinfo&titles=Bot+Test+Page&rvlimit=1&action=query
Retrying in 1 minutes...
^CTraceback (most recent call last):
File "add_text.py", line 312, in ?
main()
File "add_text.py", line 308, in main
(text, newtext, always) = add_text(page, addText, summary,
regexSkip, regexSkipUrl, always, up, True)
File "add_text.py", line 147, in add_text
text = page.get()
File "/auto/docwiki-bot/pywikipedia/wikipedia.py", line 684, in get
self._contents = self._getEditPage(get_redirect = get_redirect,
throttle = throttle, sysop = sysop)
File "/auto/docwiki-bot/pywikipedia/wikipedia.py", line 752, in
_getEditPage
data = query.GetData(params, self.site(), sysop=sysop)
File "/auto/docwiki-bot/pywikipedia/query.py", line 153, in GetData
time.sleep(retry_idle_time*60)
KeyboardInterrupt
sjc-cde-003:27>
</addtext.py>
Based on the inprop=protection part of the error, I disabled the
AuthorProtect extension and tried again:
Failed, but for a different reason:
<addtext.py - AuthorProtection = off>
python-2.4.3 add_text.py -cat:Bot_Test -text:"another bot test" -down
Getting [[Category:Bot Test]]...
Loading Bot Test Page...
>>> Bot Test Page <<<
+ another bot test
Do you want to accept these changes? ([y]es, [N]o, [a]ll) a
Updating page [[Bot Test Page]] via API
Unknown Error. API Error code:unknown_action
Information:Unrecognized value for parameter 'action': edit
Loading Meta-tag-test...
>>> Meta-tag-test <<<
+ another bot test
Sleeping for 9.3 seconds, 2009-11-24 08:01:28
Updating page [[Meta-tag-test]] via API
Unknown Error. API Error code:unknown_action
Information:Unrecognized value for parameter 'action': edit
</addtext.py - AuthorProtection = off>
Site info below:
Here is my Pywikipedia information:
uname -a
SunOS sjc-cde-003 5.10 Generic_137111-06 sun4v sparc
SUNW,SPARC-Enterprise-T5120
python-2.4.3 version.py
Pywikipedia (r7665 (wikipedia.py), 2009/11/17, 12:24:54)
Python 2.4.3 (#1, Sep 8 2006, 16:46:00)
[GCC 3.3.3]
Here is my Wikipedia information:
uname -a
Linux emerac.cisco.com 2.6.18-92.1.18.el5xen #1 SMP Wed Nov 5 09:30:07
EST 2008 i686 i686 i386 GNU/Linux
MediaWiki 1.13.5
PHP 5.1.6 (apache2handler)
MySQL 5.0.45
Extensions:
Special pages CategoryTree (Version r37574) Dynamically navigate the
category structure Daniel Kinzler
Contribution Scores (Version 1.11) Polls the wiki database for highest
user contribution volume Tim Laqua
Flagged Revisions (Version 1.094) Gives editors and reviewers the
ability to validate revisions and stabilize pages Aaron Schulz and Joerg
Baach
MultipleUpload (Version 1.0) Allows users to upload several files at
once Travis Derouin
PdfExport (Version 2.0 (4-November-2008)) renders a page as pdf Thomas
Hempel Parser hooks
BreadCrumbs Shows a breadcrumb navigation. Based heavily on Manuel
Shneider's extension[1] Kimon Andreou
CategoryTree (Version r37574) Dynamically navigate the category
structure Daniel Kinzler
EmailForm (Version 0.8a) Inserts a form mailer into a page Eric Hartwell
(Minor fixes by John Adams
ImageMap (Version r35980) Allows client-side clickable image maps using
<imagemap> tag Tim Starling
InputBox Allow inclusion of predefined HTML forms Erik Moeller, Leonardo
Pimenta, Rob Church and Trevor Parscal
MetaTag (Version 0.1) Tag to inject meta tags into page header. Jim
Wilson - wilson.jim.r<at>gmail.com
ParserFunctions (Version 1.1.1) Enhance parser with logical functions
Tim Starling
PdfBook (Version 1.0.3, 2008-12-09) Composes a book from articles in a
category and exports as a PDF book
User:Nad Rating Bar (Version 1.0 rc1) Display a rating bar using Ajax.
Franck Dernoncourt and PatheticCockroach Other
Author Protect (Version 1.1) Allows the author of a page to protect it
from other users Ryan Schmidt
FCKeditor (Version 1.0.1) Allow editing using the WYSIWYG editor
FCKeditor Frederico Caldeira Knabben, Jack Phoenix, Wiktor Walc and
others
Gadgets Lets users select custom CSS and JavaScript gadgets in their
preferences Daniel Kinzler
JSKitRating (Version 1.4.0) Provides integration with JSKit Rating tool.
Jean-Lou Dupont
LDAP Authentication Plugin (Version 1.1g) LDAP Authentication plugin
with support for multiple LDAP authentication methods Ryan Lane
SecureHTML (Version 2.3.0) Enables secure HTML code on protected pages
Jean-Lou Dupont
StubManager (Version 1.3.0) Provides stubbing facility for extensions
handling rare events. Customization template: MediaWiki:ExtensionState.
Extensions registered:JSKitRating, SecureHTML. Jean-Lou Dupont
Extension functions
lambda_54, efCategoryTree, efLoadFlaggedRevs, setupMetaTagParserHooks,
wfEmailFormExtension, wfMultipleUpload, wfRatingBar,
wfSetupParserFunctions, wfSetupPdfBook and (BreadCrumbs, setup)
Parser extension tags
<categorytree>, <emailform>, <imagemap>, <inputbox>, <meta>, <pre>,
<w4g_ratingbar> and <w4g_ratinglist>
Parser function hooks
anchorencode, categorytree, cscore, defaultsort, displaytitle, expr,
filepath, formatnum, fullurl, fullurle, grammar, html, if, ifeq,
iferror, ifexist, ifexpr, int, jskitrating, language, lc, lcfirst,
localurl, localurle, ns, numberofadmins, numberofarticles,
numberofedits, numberoffiles, numberofpages, numberofusers, padleft,
padright, pagesincategory, pagesize, plural, rel2abs, shtml, special,
switch, tag, time, timel, titleparts, uc, ucfirst and urlencode
---Robert