Hello,
Here are this past week's updates from the Discovery department.

== Highlights ==
* Finalized the second BM25 testing analysis and linked to the pdf here. [0]

==Search ==
* Migrated Phan for CirrusSearch to Jenkins. (technical debt) [1] [2] 
* Finished writing up, summarizing, and recommending extensive changes to TextCat for language identification. [3] Overall improvement to F0.5 accuracy was a mean of just under 5% across the corpora from nine Wikipedias. The two worst performing corpora, from enwiki and nlwiki, each went up around 10%! All nine are now above 90% F0.5 score. Next step is to deploy the recommended changes. [4] 
* Completed (a round of) refactoring and cleanup of Special:Search code [5] [6]

[0] https://www.mediawiki.org/wiki/Discovery_Analysis#Past_analyses
[1] https://www.mediawiki.org/wiki/Continuous_integration/Phan
[2] https://phabricator.wikimedia.org/T153040
[3] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/TextCat_Improvements#Final_Summary_.26_Recommendations
[4] https://en.wikipedia.org/wiki/F1_score
[5] https://phabricator.wikimedia.org/T150217
[6] https://phabricator.wikimedia.org/T150390

----

The archive of all past updates can be found on MediaWiki.org:

https://www.mediawiki.org/wiki/Discovery/Status_updates

Interested in getting involved? See tasks marked as "Easy" or "Volunteer needed" in Phabricator.

[1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R
[2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R


Yours,
Chris Koerner
Community Liaison - Discovery
Wikimedia Foundation