Discovery September 2015

discovery@lists.wikimedia.org

14 participants
26 discussions

[Wikimedia-search] A/B Testing cadence/process

by Kevin Smith

I understand that we are shifting to a "minimum 2-week" cadence, but I'm not sure exactly what that means. Reading Mikhail's email, it sounds like we plan to run each test for one week, and then have one week "off" to analyze those results and to prepare for the following test. Is that true? Regardless of those details, would it be helpful to have a "recipe" for each test? To know that on Day T-7, we would be thinking about X, and by Day T-4, we had better have Y in place. And then to expect Z by day T+8. Basically, to document all the little steps that might be necessary or optional before, during, and after a test. If that seems helpful, I can create a phab task to create and populate a wiki page with that kind of information. Obviously the population of that page would have to be a group effort, with input from product, engineering, analysis, and possibly others. Kevin Smith Agile Coach, Wikimedia Foundation

8 years, 8 months

[Wikimedia-search] This not Spam!!!!!...Assist me purchase a living home

by Ali. H.

- This mail is in HTML. Some elements may be ommited in plain text. - Assalamualikum, I am Hussein Ali from Syria, presently now with the United Nations on asylum. I got your contact from a web business directory on investment. Please I seek your assistance in the following ways: 1.To assist me look for a profitable business in your country (where I can invest to sustain my living until the political crisis in my country is over). 2. To assist me purchase a living home, .I have huge sum fifteen million us dollars in financial institution .Should there be a need for an evidence, or a prove of my seriousness and genuineness. I have a Certificate of Deposit as a prove of fund. Please assist me to come over to your country for resettlement and investment. I will compensate you greatly for this help. I am also ready to associate with a local partner, provided Your Government will give me a Residence Permit. Could you please send me an email on (syriaoil.aleppo(a)gmail.com ) to enable me know you have received my email. Regards, Hussein Ali.

8 years, 8 months

[Wikimedia-search] Fwd: Discovery Department A/B testing an alternative to prefix search next week

by Dan Garry

Cross-posting from wikitech-l. Please reply there. ---------- Forwarded message ---------- From: Dan Garry <dgarry(a)wikimedia.org> Date: 1 September 2015 at 20:43 Subject: Discovery Department A/B testing an alternative to prefix search next week To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> Hi everyone, *tl;dr: Discovery Department to run A/B test <https://phabricator.wikimedia.org/T111078> comparing new search suggester to prefix search, to see if it can reduce zero results rate.* As I'm sure you're all aware, the search box at the top right of every page on desktop uses prefix search to generate its results. The main reason for this is that prefix search is incredibly fast and performant; that search box sees a lot of traffic, and it's important to keep it scalable. However, we know that there are numerous problems with prefix search. Prefix searches are prone to give you no results; if you make even a slight typo, then you won't get the result you want. And thus a complex system of manually curated redirects were born to try to alleviate this navigation issue. Wouldn't it be nice if we could work towards a solution that doesn't require the manual curation of redirects, thus freeing up Wikimedians to do other more meaningful tasks? And make search a bit better in the process, too? That's a long term goal of mine... emphasis on the long. ;-) The Q1 2015-17 (Jul - Aug 2015) goal of the Search Team in the Discovery Department is to reduce the zero results rate <https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q1_Goals#Search>. Amongst other things, we've been working to build an alternative to prefix search <https://phabricator.wikimedia.org/T105746>. Documentation on the API is pretty light right now because we're scrambling to get it up and running (but there's a task for that! <https://phabricator.wikimedia.org/T111139>). An initial version of the suggestion API is now in production on enwiki and dewiki [1], but is currently not being used for anything. Our initial tests <https://phabricator.wikimedia.org/T109729> of the API show that it's incredibly promising for reducing the zero results rate. But we need more data! We're planning on running an A/B test on whether this API is better at reducing zero results. We're targeting beginning on Tuesday 8th September, for two weeks. This is documented in T111078 <https://phabricator.wikimedia.org/T111078>. A very important note here is that we currently have no way of quantitatively measuring result relevance (although we're working on it <https://phabricator.wikimedia.org/T109482>), so this test will be highly limited in scope, only measuring the zero results rate. Given the limits of this, even seeing massive success in this test is not enough to deploy this API as a full replacement of prefix search; we'd need additional data. But, that's not stopping us from gathering initial data from this test. As always, if you have any questions, let me know. Thanks, Dan [1]: The API is actually live on all wikis, but we only built the search indices for enwiki and dewiki since they're our biggest content wikis and this is an early test. Attempting to use the API on any other wiki will get you a cirrus backend error. -- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation -- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation

8 years, 8 months

[Wikimedia-search] FYI: Upcoming UX sub-team meetings

by Kevin Smith

Just as an FYI, next Thursday the Discovery's UX sub-team will start having weekly meetings, to groom the backlog and plan work for the week. For now, these will include Moiz and Dan, with Tomasz and Wes optional. As additional UX folks are hired, we'll add them, and we will also consider bringing in other people as needed. This is reflected on the process page[1]. As a reminder, the UX sub-team has its own phabricator sprint board[1]. It's not being used heavily yet, but that may change over the next few weeks. [1] https://www.mediawiki.org/wiki/Search_and_Discovery/Process#Recurring_Meeti… [2] https://phabricator.wikimedia.org/tag/discovery-ux-sprint/ Kevin Smith Agile Coach, Wikimedia Foundation

8 years, 8 months

[Wikimedia-search] On frequency of A/B tests and peeking at the data early

by Mikhail Popov

Hi all, Last week we discussed our approach to A/B testing and we've decided to have a week (at least) between tests. A two-week-minimum cadence will give the analysis team enough time to thoroughly think about the experimental design of each test, as well as give the engineers enough time to implement it. Which is great because some of the changes we are planning to test are not trivial and we don't want to rush a test out and realize halfway through that we should have been tracking something we're not. We are also going to move away from doing initial analyses (analysis of the data from the morning of a launch) for practical and scientific reasons. Practical in the sense that we've been putting time and effort into getting preliminary results that are not representative of final results whatsoever while putting other work on the backburner. Scientific in the sense that peeking at the data mid-experiment is bad science: *Repeated significance testing always increases the rate of false positives, that is, you’ll think many insignificant results are significant (but not the other way around). The problem will be present if you ever find yourself “peeking” at the data and stopping an experiment that seems to be giving a significant result. The more you peek, the more your significance levels will be off. For example, if you peek at an ongoing experiment ten times, then what you think is 1% significance is actually just 5% significance.* – Evan Miller, How Not To Run An A/B Test <http://www.evanmiller.org/how-not-to-run-an-ab-test.html> In science, it's a problem called multiple comparisons. The more tests you perform, the more likely you are to see something where there is nothing. Going forward, we are going to wait until we have collected all the data before analyzing it. Cheers, Mikhail, Junior Swifty Discovery // The Swifties

8 years, 8 months

[Wikimedia-search] MY WIFE AND MY KIDS ARE CRYING

by Ali

- This mail is in HTML. Some elements may be ommited in plain text. - MY WIFE AND MY KIDS ARE CRYING, GO THROUGH ATTACHMENTS

8 years, 8 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Discovery September 2015