The short summary of the implementation they used seems to be based on a very influential paper from 2002, Optimizing Search Engines with Clickthrough Data.  They do a pairwise transform on clickstream data and feed it into an SVM. Interestingly this model is not widely used in industry, the paper was influential but the exact methods have fallen out of favor for ensembles of decision tree's along with explicit labels rather than pairwise click data. Their reasoning behind this seems reasonable though, their content is highly silo'd and it's quite rare to have queries against the same content be repeated any reasonable number of time that would allow for labeling via click models or humans.


On Wed, Feb 8, 2017 at 10:08 AM, Wes Moran <wmoran@wikimedia.org> wrote:
"On average, 20% of a knowledge worker’s day is spent looking for the information they need to get their work done. If you think about a typical work week, that means an entire day is dedicated to this task!"

Interesting way to look at it.  Also interesting takes on recent v relevant.  Thanks for sharing.

On Wed, Feb 8, 2017 at 1:01 PM, Erik Bernhardson <ebernhardson@wikimedia.org> wrote:

_______________________________________________
discovery mailing list
discovery@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery



_______________________________________________
discovery mailing list
discovery@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery