Tl;dr: is there a place for Wikipedia-related code? What do we do for code
reusability?
I proposed a Page method in https://phabricator.wikimedia.org/T328769.
This would have shown if an article is a biography (it is about a person).
My idea was opposed because the codebase of BasePage and APIPage is not
Wikipedia-related, and some developers don't want to see any
Wikipedia-specific code in it.
OK, let's say, this is a valid argument. Let's call it *code purity*.
On the other hand, Pywikibot is mostly developed by Wikipedians, mostly
used in Wikipedia, and biographies form a primary scope of Wikipedia. I
definitely think that biographies SHOULD be subclassed to have a lot of
methods that are useful for a lot of Wikipedias and bot owners. If they
write this code for themselves again and again, that's a waste.
Now I definitely will use this feature. What can I do?
First idea is to subclass Page in my code. That is the natural solution. Of
course, when I publish my scripts, others won't be able to use them, and I
won't be able to use others' scripts as they are, because I need this
subclass. Let's call this point od view *code reusability* which is
generally kept an important thing in the world of programming.
Now, what we do (see the above task) is that we throw away code reusability
for the sake of code purity. Is that OK?
I am honestly curious, where the place of Wikipedia is in this framework.
What I WILL do in the present situation: write a module called huwiki and
don't bother other Wikipedias and place my code into this module. It will
contain functions that take the page as parameter (not nice, not natural, I
don't like it, but this is the only way if I cannot subclass) and nice
pagegenerators, all for Hungarian Wikipedia alone, because I need them.
Is that really what we want?
Please help to find the place of Wikipedia-related code.
--
Bináris
Without the screenshot this time....
> On Mar 28, 2023, at 2:01 PM, Roy Smith <roy(a)panix.com> wrote:
>
> Hmmm. What I'm doing requires Page.expand_text(), which looks like it does a Page.get() followed by a Site.expand_text(), and it's the later which actually takes most of the time. That becomes an action=expandtemplates API call <https://www.mediawiki.org/w/api.php?action=help&modules=expandtemplates>, which I don't see any way to batch.
>
>
>
>
> <Screen Shot 2023-03-28 at 1.55.55 PM.png>
>
>> On Mar 28, 2023, at 1:04 PM, Kunal Mehta <legoktm(a)debian.org <mailto:legoktm@debian.org>> wrote:
>>
>> Hi,
>>
>> On 3/27/23 15:57, Roy Smith wrote:
>>> I need to issue a bunch of Page.get() requests in parallel.
>>
>> Please don't. From <https://www.mediawiki.org/wiki/API:Etiquette#Request_limit <https://www.mediawiki.org/wiki/API:Etiquette#Request_limit>>:
>>
>> "Making your requests in series rather than in parallel, by waiting for one request to finish before sending a new request, should result in a safe request rate."
>>
>> Instead of making parallel requests, you should make batched requests, which is how the preloading stuff Xqt mentioned works.
>>
>> -- Kunal / Legoktm
>> _______________________________________________
>> pywikibot mailing list -- pywikibot(a)lists.wikimedia.org <mailto:pywikibot@lists.wikimedia.org>
>> Public archives at https://lists.wikimedia.org/hyperkitty/list/pywikibot@lists.wikimedia.org/m… <https://lists.wikimedia.org/hyperkitty/list/pywikibot@lists.wikimedia.org/m…>
>> To unsubscribe send an email to pywikibot-leave(a)lists.wikimedia.org <mailto:pywikibot-leave@lists.wikimedia.org>
>>
>
I need to issue a bunch of Page.get() requests in parallel. My understanding is that pywikibot uses the requests library which is incompatible with async_io, so that's out. So what do people use? Threading <https://docs.python.org/3.9/library/threading.html>? Or, I see there's an async_io friendly requests port <https://github.com/rdbhost/yieldfromRequests>. Is there a way to make pywikibot use that?
Today in wikipedia:hu namespaces 118, 119 were set. See T33308.
https://phabricator.wikimedia.org/T333083
Now I have problems with the new namespaces.
Message: KeyError: '118 is not a known namespace. Maybe you should clear
the api cache.'
Source: Pywikibot\pywikibot\site\_namespace.py
>>> list(site.namespaces())
[-2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 90, 91, 92,
93, 100, 101, 710, 711, 828, 829, 2300, 2301, 2302, 2303]
How to clear the API cache? Should Pywikibot automatically recognaize the
new namespaces or shall I alter the code somewhere?
--
Bináris
To my surprise, wikicode.get_parent() does not get you the section a node is part of:
> import mwparserfromhell as mwp
>
> text = """==foo==
> {{Template:Foo}}
> """
> wikicode = mwp.parse(text)
> print(wikicode.get_tree())
>
> print('++++++++++')
>
> node = wikicode.nodes[-2]
> print(f"{node=}")
> print(f"{wikicode.get_parent(node)=}")
prints:
> ==
> foo
> ==
> \n
> {{
> Template:Foo
> }}
> \n
> ++++++++++
> node='{{Template:Foo}}'
> wikicode.get_parent(node)=None
Am I just doing this wrong?
I've got some code which is essentially:
> wikicode = mwp.parse(self.page.get())
> for node in wikicode.filter_templates(recursive=False, matches=title):
> wikicode.remove(node)
> self.page.text = str(wikicode)
> self.page.save()
which works, but it leaves an extra blank line behind where the template used to be. This is intended to be run on [[:en:Template talk:Did you know/Approved]], i.e. one template per line.
What's the best way to get rid of the blank lines? I'm trying to avoid just running a regex replacement on the raw text because that's fragile, but maybe theres really no good alternative here?
I'm gearing up to do some work (hopefully dive into fixing https://phabricator.wikimedia.org/T326650). I've gotten as far as closing the repo and running the existing unit tests. I get 4 failures:
FAILED tests/make_dist_tests.py::TestMakeDist::test_handle_args - AssertionError: '/Users/roy/pywikibot/pywikibot-git/tests/make_dist_tests.py' != '/Users/roy/pywikibot/venv/bin/pytest'
FAILED tests/make_dist_tests.py::TestMakeDist::test_handle_args_empty - AssertionError: '/Users/roy/pywikibot/pywikibot-git/tests/make_dist_tests.py' != '/Users/roy/pywikibot/venv/bin/pytest'
FAILED tests/make_dist_tests.py::TestMakeDist::test_handle_args_nodist - AssertionError: '/Users/roy/pywikibot/pywikibot-git/tests/make_dist_tests.py' != '/Users/roy/pywikibot/venv/bin/pytest'
FAILED tests/site_detect_tests.py::MediaWikiSiteTestCase::test_proofreadwiki - RuntimeError: Unsupported url: https://www.proofwiki.org/wiki/ <https://www.proofwiki.org/wiki/>
Are these known issues? Or something wrong with my environment?
I'm working on MacOS Monterey, with Python 3.9.
Hi,
I just noticed that I cannot fetch anymore information from Wikidata. I’m not editing, I’m just reading:
wikidata_item = pywikibot.ItemPage(wikidata_repo, arg)
and i get:
Sleeping for 25.8 seconds, 2023-03-07 11:15:43
Is there some problem on Wikidata infrastructure?
Salut
Dennis