Hey everyone,

As part of T195491, Erik has been looking into the details of our regex processing and ways to handle ridiculously long-running regex queries. He pulled all the regex queries over the last 90 days to get a sense of what features people are using and what impact certain changes he was considering would have on users. Turns out there are a lot more users than I would have thought—which is good news! And a lot of them look like bots.

He also made the mistake of pointing me to the data and highlighting a common pattern—searches for interwiki links. I couldn't help myself—I started digging around found that the majority of the searches are looking for those interwiki links, and the vast majority of regex searches fall into three types—interwiki links, URLs, and Library of Congress collection IDs.

Overall, there are 5,613,506 regexes total across all projects and all languages, over a 90-day period. That comes out to ~62K/day—which is a lot more than I'd expected, though I hadn't thought about bots using regexes.

Read more on MediaWiki.

—Trey

Trey Jones
Sr. Software Engineer, Search Platform
Wikimedia Foundation