On Mar 28, 2023, at 9:09 PM, Kunal Mehta <legoktm(a)debian.org> wrote:
I suppose it's also worth asking what you're
using expand_text() for in the first place, to see if there's a better way to do
whatever it is you want to :)
That's a fair question.
What I'm doing is looking at DYK nominations to evaluate if they've been approved.
Like so many wiki things, there's no formal definition, but the simple version is
that I'm looking for "File:Symbol confirmed.svg". The problem is that it
may not appear in the raw wikitext. An example is Bismarck Kuyon
<https://en.wikipedia.org/wiki/Template:Did_you_know_nominations/Bismarck_Kuyon>.
Looking at the page, it's easy to see the green checkmark indicating approval. But
looking at the wikitext source, there's no such thing. What there is, is a {{DYK
checklist}} template which invokes some Lua code that generates the checkmark based on the
values in the other fields. The expand_text() forces that to get run on the server side.
From a machine-parsability point of view, it's insane. But I gotta work with what
I've been given.
Ultimately, this is going to run as a bot. That fact that it takes a couple of minutes to
evaluate all the nominations of interest isn't critical. I was doing an interactive
web-based version for review purposes, and for that, waiting 2 minutes for the page to
load sucked. But, I don't really need to do that, so I'll probably just go back
to the serialized version and leave it at that.
One optimization I can see is that I only really need to do the expand_text() on the
subset of nominations which use {{DYK checklist}}, and not even all of those (sometimes
it's possible to determine the approval state entirely from the text following the
{{DYK checklist}}). That will add a bit more complexity, which I was trying to avoid.
Even deeper down the complexity rathole, I could re-implement the Lua logic on the client
side and avoid the expand_text() completely. I believe that's what some existing
bots, such as WugBot do. But I really didn't want to go there.
I did a little reading about your mwbot-rs project. At one point, I was actually kind of
excited about Rust and might have joined you just for the excuse to learn it. Maybe some
day. I am totally about your goal of "sustainable development of bots and
tools". We've got so many tools (some of which important processes like DYK are
totally dependent on) which are, frankly, a mess of single-purpose code which can't be
easily reused for anything else. What I've been trying to do with dyk-tools is create
a toolkit of reusable components which other people can build upon. But I seem to be
spending most of my time working around silly things like the {{DYK checklist}} stuff.
Anyway, I hope that answers your question :-)
BTW, I've mentioned this before, but I really can't recommend viztracer
<https://github.com/gaogaotiantian/viztracer> highly enough as a performance
analysis tool. At one level, it's just cProfile on steroids, but with a snazzy
graphical front end. It's what let me figure out that it was expand(), not get(),
which was the most expensive. I uploaded a screenshot to commons.
<https://commons.wikimedia.org/wiki/File:Screen_Shot_of_viztracer_output.png>