Hey,

This is the 23rd weekly update from revision scoring team that we have sent to this mailing list.

New development
  • We implemented and demonstrated a linguistic/stylometric processing strategy that should give us more signal for finding vandalism and spam[1].  See the discussion on the AI list[2]. 
  • As part of our support for the Collaboration Team, we've been producing tables of model statistics that correspond to set of thresholds[3].  This helps their designers work on strategies for reporting prediction confidence in an intuitive way.

Maintenance and robustness
  • We had a major downtime event that was caused by our logs being too verbose.  We've recovered and turned down the log level[4].

Datasets
  • We created a database on Wikimedia Labs that provides access to a dataset containing a complete set of article quality predictions for English Wikipedia[6].  See our announcements[7,8,9].

1. https://phabricator.wikimedia.org/T146335 -- Implement a basic scoring strategy for PCFGs
3. https://phabricator.wikimedia.org/T146280 -- Produce tables of stats for damaging and goodfaith models
4. https://phabricator.wikimedia.org/T146581 -- celery log level is INFO causing disruption on ORES service
5. https://phabricator.wikimedia.org/T146720 -- Ensure that halfak gets emails when ores.wikimedia.org goes down
6. https://phabricator.wikimedia.org/T106278 -- Setup a db on labsdb for article quality that is publicly accessible
7. https://phabricator.wikimedia.org/T146156 -- Announce article quality database in labsdb

Sincerely, 
Aaron from the Revision Scoring team