Hey all,
I'm new and trying to wrap my head around how we present all the work
that's going in Discovery. I have a thought I wanted to run by folks that
(I think) will help us provide a more consistent introduction to the
various projects that are being worked on.
Right now Wikimedia Discovery
<https://www.mediawiki.org/wiki/Wikimedia_Discovery> is a wealth of
information, but there isn't much consistency on what each project page
contains. Some are labeled as "Wikimedia engineering activity", some don't
have a page about the project (Search links to the extension), etc.
My thought is to bring the minimum for each project to a consistent level.
For example, the page for Project A would talk about what is it, what are
we trying to to do, members involved, roadmap, current progress, where to
give feedback, etc. Each project would have it's own page with this minimum
and a consistent layout.
Related, I also have the sub-hub for the Portal marked for translation
<https://meta.wikimedia.org/wiki/Special:PageTranslation> and I think it
would be wise to do the same elsewhere.
If there's any dissent, let me know otherwise full-steam ahead.
--
Yours,
Chris Koerner
Community Liaison - Discovery
Wikimedia Foundation
Hi!
I'd like to describe the refactoring we are doing on SearchEngine/Prefix
search. The goal of it is to bring prefix search into SearchEngine API
and use the unified API for prefix searches, which will also allow to
use new ElasticSearch completion suggester in many places where prefix
search is done.
Please comment if you see any problem in this or have any suggestions.
The current plan is as follows:
1. SearchEngine gets the following new API functions:
public function defaultPrefixSearch( $search );
public function completionSearch( $search );
public function completionSearchWithVariants( $search );
defaultPrefixSearch is for simple prefix searches (namespace lists,
special pages, etc.).
completionSearch* is for completions that need scoring, fuzzy matching,
and so on.
2. There's also internal function:
protected function completionSearchBackend( $search )
That is what SearchEngine implementation (like CirrusSearch) will
override. SearchEngine base class deals with namespace handling, result
ordering, etc.
3. TitlePrefixSearch and StringPrefixSearch will be deprecated (but stay
in code almost unchanged for now) and use of SearchEngine APIs is
recommended, such as SearchEngine::defaultPrefixSearch() or
completionSearch() depending on whether simple or advanced handling is
desired.
4. Hooks PrefixSearchBackend and PrefixSearchExtractNamespace will be
deprecated, and overriding completionSearchBackend() and
normalizeNamespaces() in SearchEngine is recommended instead. For now,
these hooks will be supported by base SearchEngine implementation, but
not by CirrusSearch.
The task for this is https://phabricator.wikimedia.org/T121430, it also
links the patches as they are now (still work in progress).
--
Stas Malyshev
smalyshev(a)wikimedia.org
Hey all,
A couple of weeks ago we ran an A/B test on the Wikipedia portal
(www.wikipedia.org) to test whether a more prominent search box,
optionally combined with additional metadata such as small images in
the search results, would increase the rate at which people clicked
through from the portal to one of our projects.
We are delighted to say that the test showed a 1-5% increase in the
clickthrough rate, where both a prominent search box and metadata is
used. Accordingly, once we've resolved concerns about the design's
non-JavaScript usability, we hope to deploy it for all users.
The report can be seen at
https://commons.wikimedia.org/wiki/File:First_Portal_Test.pdf - please
let me know if you have any questions.
For Discovery Analytics,
--
Oliver Keyes
Count Logula
Wikimedia Foundation
These results are really promising. I'm excited to share them with an
audience larger than discovery-l if that's appropriate. Any suggestions on
a good audience to start with? I took a stab at updating the Portal
Improvements page
<https://meta.wikimedia.org/w/index.php?title=Wikipedia.org_Portal_Improveme…>
and would appreciate any feedback/corrections.
--
Yours,
Chris Koerner
Community Liaison
Wikimedia Foundation
Hello,
I have few question for discovery portal and API:
- possible to be included in the 0.05% who tried the* A/B test*?
- do you have or plan to have a feature for *discovery *between* topics?* E.g.
suggestions connecting two topics.
I ve been working on this.
- Which is the difference in API:Search between:
*list=search *and* generator=search ?*
The first seems to be faster to me.
- I need to fetch the *pageIDs* of the results, and have them already
redirected and decorated with images and snipped/excerpt.
With generator=search I could do it, I can't with list=search
Could you help in grasping query parameters?
#generator=search
params = {'action':'query', 'generator':'search', 'gsrnamespace' : 0,
'gsrsearch' : keywords, 'gsrlimit' : 20 , 'prop' : 'pageimages|extracts',
'pilimit' : 'max', 'exintro' : '', 'explaintext' : '', 'exsentences' : 3,
'exlimit' : 'max', 'redirects' : '' }
#list=search
params = {'action':'query', 'list':'search', 'srsearch' : keywords,
'srlimit' : 20 , 'srprop' : 'size', *'indexpageids' : 1 *} ??
I m trying with sand box with no success, only search results are hit, no
matter of
https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&list=search&f…
- I also found 0&formatversion=2' is another param used in search=list
Has it a special meaning?
I just added a section to the Discovery process page[1] documenting the
guidelines that the Portal team uses for story point estimation. I would
like to include a corresponding section for the Analysis team. It can be
entirely different, since story point values never cross team boundaries.
[1] https://www.mediawiki.org/wiki/Wikimedia_Discovery/Process#Story_Points
Kevin Smith
Agile Coach, Wikimedia Foundation
About fetching pageID with generator module on list=search,
could you please debrief?
I ll try to be more specific.
I understood that generators take a list of titles, and I would like to
return the pageID of titles in that list.
But cannot get it to work, it looks like generators gives another output,
although in documentation is written it doesn't.
As example, I query "DJ Tiesto" with list=search.
Then use generator=allpages with indexpageids selected.
Results in query['search'] are relevant; the pageIDs in query['pageids']
are not related to the list titles:
https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&list=search&f…
*How to have items in query['search'] completed with pageIds along their
snippet, or a dictionary of matching tiltes ?*
Searching with a generator=search (and not list=search) will work.
https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&prop=extracts…
I don't understand how the two (generator=search and list=search) work.
Hi,
http://en-suggesty.wmflabs.org/suggest.html is updated with a score that
integrates pageviews.
Pageviews solve most of the problems we encountered in the previous formula
unfortunately we now see some porn related suggestions.
- x will suggest xxx
- po will suggest pornhub just below poland in 2nd position. And is ranked
#6 for the query 'p'
I just wanted to let you know about this and would like to know if it's
something we should address.
Thanks for your feedback.
David.
As we prepare to bring our new Discovery ops person on board in a couple
weeks, we have been talking about how to integrate Guillaume's workflow
into our process. The following strawdog proposals are based on
conversations that involved me, Dan, Guillaume, and others.
1. Meetings
We propose having Guillaume attend the search team standups for now, since
90+% of his initial work will relate to search. He would probably not any
of the sprint planning or backlog grooming meetings. It's not clear whether
we should have a weekly Ops planning meeting, which would resemble the
Analysis planning meetings we already have.
2. Phabricator
We would create a Discovery-ops-sprint project/board, which will represent
the work of the Discovery ops team. This aligns with how we have handled
the Maps and WDQS sub-teams, which are also very small teams.
3. Learn and iterate
Whatever we end up trying as a starting point, we'll inspect and adapt it
as we go.
Any questions, comments, concerns, or questions?
Kevin Smith
Agile Coach, Wikimedia Foundation
The Discovery Portal team has been thinking about a Portal Labs page.
The idea is that we can implement some revolutionary ideas for our portal
page, things that are completely different than what the current portal
page looks like, and deploy it on this site for real users to use. But
without imposing a disruptive user experience to our users. We can put a
link to this page on the production portal page (in the bottom?), and users
can have an option to bookmark the page, and maybe make one of the
experiments their default.
We would have two trains:
- Slow train: running regular A/B tests (like the one we just ran) on the
official portal page and deploying small improvements as we learn.
- Faster train: "Revolutionary" prototypes in Labs where we also collect
traffic and clickthrough rate to measure user satisfaction. We can also
implement a "Send a Feedback" feature (or have a link on the prototype page
that points to a Phab ticket where community can add comments/feedback).
To give you an example of what we mean by revolutionary ideas, I uploaded
some of my research time work:
https://people.wikimedia.org/~jgirault/
Pay closer attention to:
Trending Showing top 9 articles (grid)
<https://people.wikimedia.org/~jgirault/react-top9-cards/>Trending Showing
top 10 articles (full screen)
<https://people.wikimedia.org/~jgirault/react-top10-fs/>
This would allow us to think outside the box and test different
layouts/features, with real users who chose to.
This is kind of a crazy idea, and we want to know what you all think about
it. Also we would need some naming ideas for it. Portal labs, or beta
portal, or something else.
Please let us know what you think about this, how you think we can go
towards making this happen and what we should name it.
Thanks!
Julien