Hi,
maybe not the best place to talk about that but...
I'd like to categorize some phab tasks so that I can access them quickly
in the future. At first I thought that tags would be a perfect fit by
creating my own custom tags. But as far as I understood tags are
projects and I'm not allowed to create them.
I suppose that if this feature is protected behind permissions this is
because phab admins do not want someone to pollute the system with
random tags.
My usecase is:
Sometimes users report queries that are not performing very well.
Usually by reading the query I can identify and classify the cause. This
cause can be something like:
- bad weighting of words in the title
- text analysis issue
- index/db discrepancies
- ...
This list is quite vague...
While it's not worth fixing a particular issue that mentions a specific
query it's sometimes helpful to retrieve such tickets (where sometimes I
added a comment) while I'm working on this class of problems:
- just to have more examples to test
- maybe I was wrong with the initial classification and the problem is
elsewhere
Retrieving such tickets is painful today, because I have to rely on
search, not to blame phab developpers, search is hard we all know :)
Today I used the parent/child relationships e.g.
https://phabricator.wikimedia.org/T128073 but I don't think it's the
proper approach because when I classify tickets I don't necessarily have
a parent task ready.
Thanks for your suggestions.
Hi Discovery,
Is there a particular term for search engine sidebars of Wikipedia content?
For example, do we call them "search engine previews" or "Wikipedia
sidebars on search pages"? I imagine that Google and Microsoft have certain
terminology, and I'd like to be consistent when I'm referring to them in
the LearnWiki videos, provided that the term is something that the average
user would understand.
Thanks,
Pine
Hey everyone,
Mikhail has written up and should soon release his report on our recent
TextCat A/B tests; the results look good, and language identification and
cross-wiki searching definitely improve the results (in terms of results
shown and results clicked) for otherwise poorly performing queries (those
that get fewer than 3 results).
Mikhail's report also suggests looking at some measure of confidence for
the language identification to see if that has any effect on the quality
(in terms of number of results, but more importantly clicks) of the
crosswiki (also "interwiki") results. This sounds like a good idea, but
TextCat doesn't make it super easy to do. I have some ideas, though, and I
would love some suggestions from anyone else who has any ideas.
The details are kind of technical, so if that kind of thing makes your eyes
glaze over, you should avert your gaze now.
Otherwise, check out my write up on TextCat and confidence
<https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/TextCat_and_Confiden…>
and share your ideas here, or on the talk page.
Thanks!
—Trey
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
Hello!
While looking at the elasticsearch dashboard on Grafana [1] I see that
we have weekly spikes in response times from codfw. My guess is that
this is related to the weekly update of page rank.
More details:
We see fairly large spikes on the overall 95%-ile for codfw (from a
usual ~300[ms] to ~1-1.5[s]). Those spikes are more visible on codfw
than on eqiad as we have less overall traffic on codfw compared to
eqiad. This makes indexing more visible compared to reads. So far, no
problem, the graph look bad, but this can be explained and does not
show user impact.
We also see weekly spikes on the 75%-ile of more-like queries (from a
usual ~200-300[ms] to 300-400[ms]). More-like queries are the only
queries sent to codfw. This is not yet worrisome, but is probably
something we should keep an eye on and improve before it starts to be
an issue.
I have mostly no idea how those page rank updates work. Would it be
possible to throttle the index update from those jobs? Increase the
frequency of those update to reduce the impact?
Idea welcomed...
Guillaume
[1] https://grafana-admin.wikimedia.org/dashboard/db/elasticsearch-percentiles
--
Guillaume Lederrey
Operations Engineer, Discovery
Wikimedia Foundation
Hello,
The Wikimedia Foundation Discovery Search team
<https://www.mediawiki.org/wiki/Wikimedia_Discovery/Search> has recently
discovered
<https://commons.wikimedia.org/wiki/File:From_Zero_to_Hero_-_Anticipating_Ze…>
that
search queries that end with a question mark (i.e. "*how old is Tom Cruise?*")
can sometimes lead to zero (or unusable) results being returned. This zero
result rate is one of the primary ways that the Search team determines how
much our users are satisfied with their query results
<http://discovery.wmflabs.org/metrics/#kpi_zero_results>.
In order to improve the results that queries containing a questions mark,
we'd like to change the behavior of the search on the backend. However, we
would love to have feedback from the community to make sure that this is a
smart change to do.
If you are interested in how search works, or see this change as a possible
disruption for your work, please learn more about this potential change
<https://meta.wikimedia.org/wiki/Discovery/Handling_question_marks_in_search…>
and
let us know your thoughts.
Cheers from the Discovery Search team!
--
Deb Tankersley
Product Manager, Discovery
IRC: debt
Wikimedia Foundation