Scott:

A good place to start to read about "bot spam" and its impact on the data is this one: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/BotDetection We recently released a new classification for traffic. Besides classifying traffic as "user" or "spider" we also have now "automated" which tags as such traffic from a number of entities (but not all) that can be described as "high-volume spammers". You probably have some questions after reading the doc and for those we can set up a meeting.

Thanks,

Nuria





On Tue, Jun 16, 2020 at 9:55 AM Scott Bassett <sbassett@wikimedia.org> wrote:
Hello Analytics Team-

The Security Team has recently spent some cycles investigating improved anti-automation (bad bots, high-volume spammers, etc.) solutions, particularly around an improved Wikimedia captcha.  We were curious if your team has any methods or advice regarding the analysis of nefarious automated traffic within the context of raw web requests or any other relevant analytics data.  If the answer is "not really", that's fine.  But if there are some relevant tools, methods, research, etc. your team has performed that you would like to share with us, that would be much appreciated.  If it makes sense to discuss this further during a quick call, I can try to find some time for a few of us over the next couple of weeks.  We also have an extremely barebones task where we are attempting to document various methods of measurement which might be helpful: https://phabricator.wikimedia.org/T255208.

Thanks,

--
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics