* Preparing new bot tasks, developing fixes and regexes
* Creating lists of common errors for the community
* There is a common spelling error which is quite easy to detect when the word is [[link]]ed, but almost impossible without linking (due to the enormous number of false positives). Ma idea is to save the hits from the linked version and use them for the unliked as a list of errors rather than a pattern.
* There is another common error which is not worth to be treated by bot due to the enormous number of false positives. But if I could save the list automatically (without modifying pages), it could be revised by volunteers and used later as a list of errors rather than a pattern.
* Showing this list to users or groups of interests in order to teach them which errors to avoid in the future.
* Scientific purpose
* etc.
== Solution ==
Sorry, I cannot create a diff now, because this directory is not versioned. However, these 4 steps are not complicated to follow.
=== textlib.py ===
def replaceExcept(text, old, new, exceptions, caseInsensitive=False,
allowoverlap=False, marker='', site=None):
became:
def replaceExcept(text, old, new, exceptions, caseInsensitive=False,
allowoverlap=False, marker='', site=None, returnPairs=False):
Just within 80 characters. :-) So it won't cause any harm when called from anywhere without the new argument, the behaviour is unchanged for existing calls.
A new initialization:
pairs = []
At the end of the main if, bottom of this branch:
else:
# We found a valid match. Replace it.
the last line:
markerpos = match.start() + len(replacement)
became:
markerpos = match.start() + len(replacement)
pairs.append((match.group(), replacement))
And at the very end of the method instead of return text now I have:
if returnPairs:
return (text, pairs)
else:
return text
=== replace.py ===
replaceExcept() is called from doReplacements(). Without details, instead of returning new_text, now it will
return (new_text, replaceList)
where replaceList is a list of (old, new) tuples.
Generally it is not recommended to mix returning values and making side effects, such as storing pairs in a list, which is global to the method, so I decided do give back pairs. The main method of the bot (run()) can handle it according to given parameters, either to increment a counter, or save the (old, new) pairs to a file or a wikipage, or do nothing, just the classic task of replacement. It needs some memory, but by this point only pairs of the actual page are stored. Unless you explicitely create a huge list with all the occuring pairs, which is not neccessary, it won't cause a problem.