I'd be up to it if we manage to cram it up in a following sprint and it is worth it.

We could run a controlled test against production with a long batch of articles and check median/percentiles response time with repeated runs and highlight the different results for human inspection regarding quality.

It's been noted previously that the results are far from ideal (which they are because it is just morelike), and I think it would be a great idea to change the endpoint to a specific one that is smarter and has some cache (we could do much more to get relevant results besides text similarity, take into account links, or see also links if there are, etc...).

As a note, in mobile web the related articles extension allows editors to specify articles to show in the section, which would avoid queries to cirrussearch if it was more used (once rolled into stable I guess).

I remember that the performance related task was closed as resolved (https://phabricator.wikimedia.org/T121254#1907192), should we reopen it or create a new one?

I'm not sure if we ended up adding the smaxage parameter (I think we didn't), should we? To me it seems a no-brainer that we should be caching this results in varnish since they don't need to be completely up to date for this use case.

On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson <ebernhardson@wikimedia.org> wrote:

Both mobile apps and web are using CirrusSearch's morelike: feature which is showing some performance issues on our end. We would like to make a performance optimization to it, but before we would prefer to run an A/B test to see if the results are still "about as good" as they are currently.

The optimization is basically: Currently more like this takes the entire article into account, we would like to change this to take only the opening text of an article into account. This should reduce the amount of work we have to do on the backend saving both server load and latency the user sees running the query.

This can be triggered by adding these two query parameters to the search api request that is being performed:

cirrusMltUseFields=yes&cirrusMltFields=opening_text

The API will give a warning that these parameters do not exist, but they are safe to ignore. Would any of you be willing to run this test? We would basically want to look at user perceived latency along with click through rates for the current default setup along with the restricted setup using only opening_text.

Erik B.

_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l