I'm using replace.py to create wikilinks. Usually I want to select only the
first occurrence of the search string, and my command works fine for this.
But sometimes, the first hit is not suitable (e.g. it's part of a book or
course title, so I don't want to add the wikilink). If I choose n for no,
the bot goes to the next page.
Is there a way I can skip to the next occurrence in the same page? I'm
guessing it will need a modified version of replace.py, so that it gives an
extra option besides ([y]es, [N]o, [e]dit, open in [b]rowser, [a]ll,
[q]uit)
The actual command I'm using is:
python replace.py -regex "(?si)\b((?:FOO1|FOO2))\b(.*$)
" "[[\\1]]\\2" -exceptinsidetag:link -exceptinsidetag:hyperlink
-exceptinsidetag:header -exceptinsidetag:nowiki -exceptinsidetag:ref
-excepttext:"(?si)\[\[((?:FOO1|FOO2)[\|\]])" -namespace:0 -namespace:102
-namespace:4 -summary:"[[Appropedia:Wikilink bot]] adding double square
brackets to: FOO1|FOO2." -log -xml:currentdump.xml
Many thanks!
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.org
identi.ca/appropedia
twitter.com/appropedia
I want to generate a list of matches for a search, but not do anything to
the page.
E.g. I want to list all pages that contain "redirect[[:Category", but I
don't want to modify the pages.
I guess that it's possible to modify redirect.py (I don't speak python, but
it shouldn't be hard) and run it with -log. But maybe there's a simpler way?
Thanks in advance.
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.orgcommunity.livejournal.com/appropedia
identi.ca/appropedia
twitter.com/appropedia
Hi!
Do you have any idea why, using replace.py on some large dumps, I get
this error message:
C:\pywikipedia>replace.py -xml:enwiki-20091128-pages-articles.xml
Please enter the text that should be replaced: impossibletofindword
Please enter the new text: found
Please enter another text that should be replaced, or press Enter to start:
The summary message will default to: Robot: Automated text
replacement (-impossibletofindword +found
)
Press Enter to use this default message, or enter a description of the
changes your bot will make: test
Reading XML dump...
Traceback (most recent call last):
File "C:\pywikipedia\pagegenerators.py", line 847, in __iter__
for page in self.wrapped_gen:
File "C:\pywikipedia\pagegenerators.py", line 779, in
DuplicateFilterPageGenerator
for page in generator:
File "C:\pywikipedia\replace.py", line 218, in __iter__
for entry in self.parser:
File "C:\pywikipedia\xmlreader.py", line 295, in new_parse
for rev in self._parse(event, elem):
File "C:\pywikipedia\xmlreader.py", line 304, in _parse_only_latest
yield self._create_revision(revision)
File "C:\pywikipedia\xmlreader.py", line 341, in _create_revision
redirect=self.isredirect
File "C:\pywikipedia\xmlreader.py", line 64, in __init__
self.username = username.strip()
AttributeError: 'NoneType' object has no attribute 'strip'
'NoneType' object has no attribute 'strip'
I updated pywikipedia to the last revision with no results.
As you can see it does not seem to be user-fixes.py or regex-related.
Thanks in advance!
Davide Bolsi
Hi Russel,
the main reason not to join to the rewrite branch is, I did not got it running yet. I get an importError for simplejson. And I have no idea seting PYTHONPATH playing with idle. Whereas the trunk is easy to use: install python, download the bot and expand it, run it. This is the usability I would expect.
Most of the scripts are out of date since they are modified in trunk but not actualized at rewrite. I guess both forks have to be developed in parallel for a while until all (main) scripts are merged. I could supporting the rewrite development but since I could not test that stuff I wouldn't.
However, I have reservations about the effect that the development for older mw versions are cut.
Regards
----- Original Nachricht ----
Von: Russell Blau <russblau(a)imapmail.org>
An: Pywikipedia discussion list <pywikipedia-l(a)lists.wikimedia.org>
Datum: 30.03.2010 16:18
Betreff: [Pywikipedia-l] Request for feedback on rewrite branch
> I am at a point where it would be helpful to have some feedback from other
> Pywikipedia users about the future of the rewrite branch. As those who
> watch the SVN commits know, I have not had as much time to work on this
> lately, and have to prioritize what time I do spend on it.
>
> For those who have used the rewrite branch, what (if anything) needs to be
> done to it to get you to use it exclusively and retire the old wikipedia.py
>
> system? What is missing? What is broken? What is present but could be
> improved?
>
> For those who have chosen not to use the rewrite branch, why not? What
> might lead you to take another look?
>
> And then, I'm sure there are many whose reaction to this post has been,
> "What's the rewrite branch?" I don't know what to ask you, so feel free to
>
> move on to the next message.
>
> Most critically, is there any reason to continue development of the trunk
> once the rewrite branch is at a point where most users are ready to switch
> to it?
>
> -- Russ
>
>
> _______________________________________________
> Pywikipedia-l mailing list
> Pywikipedia-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
Traumziele - von Beschreibung bis Buchung jetzt kompakt auf den Reise-Seiten von Arcor.de! http://www.arcor.de/rd/footer.reise
Hi folks,
Earlier when using replace.py, ndash (Alt 0150) characters appeared on my
display as yellow *-* marks. This was good, I knew that a "-" was "-", and a
yellow "-" was – (Alt 0150). And the text was readable. Now these characters
appear as yellow "*o*" letters, and this is rather annoying, prevents me of
reading text comfortably. Could someone please undo this change? Thank you
in advance.
--
Bináris
I usually collect the suspicious articles on my subpage (
http://hu.wikipedia.org/wiki/Szerkeszt%C5%91:BinBot/try), then I run
replece.py with option "-links:User:BinBot/try". This does not work, if the
article is in File (in Hungarian: Fájl) namespace. I tried to put semicolons
into the links (*[[:Fájl:something]] instead of *[[Fájl:something]]),
nothing happened. Is there an issue that -links doesn't work with files? It
works properly directly, e. g. wit "-xml:..." or "-transcludes:..." options.
--
Bináris
Dear developers!
In solve_sisambiguation.py I can use "more" option to see a wider
environment of the selected text. Repeating this, I see mpre and more of the
text.
Would it be possible to implement this useful option in replace.py, too?
Sometimes I need it very much, when the text is in a short line.
I can open the article in editor or browser, but it is too slow, and not
always necessary. Pressing "m" in solve_sisambiguation.py is much faster.
--
Bináris
[14:50:33] <Lcawte> can someone remove standard from the line 116 in
capitalize_redirects.py... it brakes the script
[15:15:28] <Lcawte> Ok, made another change or two..
http://lcawte.pastebin.com/ggiXP8gp < Full copy of the script
[21:11:26] <Lcawte> valhallasw: can you commit something for me..
capitalize_redirects.py would not work.. so I fixed it (It works on all
the wikis I've tried)
[21:20:16] <valhallasw> Lcawte: sorry, not at the moment. Haven't got my
ssh key at hand.
[21:20:29] <Lcawte> ah k
So can you commit that for me.. You can remove my name from the script
if you like, but I'd like a little bit of credit, probably in the commit
message.
Thanks,
Lewis Cawte (Lcawte)