Sometimes, researchers ask how to collect samples of wiki spam, e.g. to
train anti-vandalism tools like
https://www.mediawiki.org/wiki/Extension:BayesianFilter
As a reminder, WikiTeam downloads most MediaWiki wikis on the web. Some
of those are predominantly made of spam. This dump, for instance, has a
2 GB XML (380 MB compressed) of 100 % spam. (Snowolf thinks the wiki has
never been used, other than by spambots.)
https://archive.org/details/wiki-server0net
Usually such dumps are easy to identify because 7z compresses them 10 or
100 times despite a low revisions/pages ratio. I can make lists if
there's interest.
Nemo