Robert Rohde wrote:
Which, after substituting "display:none;" I
think translates directly to
the regex search:
insource:/style[ ]*=[ ]*\"display:[ ]*none;[ ]*\"/i
That gives me 487 articles.
Almost, but not quite. You actually want this:
insource:/style[ ]*=[ ]*\"display:[ ]*none;?[ ]*\"/i
With the semicolon being made optional, the search results increase from
487 to 2,487 currently on the English Wikipedia. The normalization script
(<https://phabricator.wikimedia.org/P2229>) made the trailing semicolon
consistent, in addition to lowercasing and trying to account for strange
spacing. For whatever reason, "display: none;" is often written without
the trailing semicolon in main namespace pages on the English Wikipedia.
I was worried that I may have made a major coding mistake, so I re-ran my
script using this pattern:
pattern = r'style[ ]*=[ ]*"[ ]*display[ ]*:[ ]*none[ ]*;?[ ]*"'
The results are available here: <https://phabricator.wikimedia.org/P2255>.
Sixteen articles have over 1,000 instances of "display: none;" each! The
total is 142,176 instances of "display: none;" (normalized) in 2,507 main
namespace pages on the English Wikipedia, as of about 2015-10-02.
I am happy to agree that searching the XML should be
better than the local
search tool, but I still find these numbers hard to reconcile.
After re-reviewing the code and re-running the script to focus on
"display: none;" specifically, there's strong evidence to suggest that the
numbers are accurate, if not a bit surprising in some cases. :-)
MZMcBride