[Labs-l] Empty Cat query

Jaime Crespo jcrespo at wikimedia.org
Fri Sep 23 16:56:07 UTC 2016


So I cannot give you very specific advise, but based on the EXPLAIN, it
seems that you may be trying to read too many rows, up to
100K * 300K * 300K (10 000 M) and then sort them, which is probably going
to be very slow.

If they used to work in the past there could be 2 probable explanations: a
lot of rows have been inserted recently on one or several of the tables
(e.g. a new template or category with many members) or the indexes have
changed. Note that redoing the queries is something that we have to do
constantly in production because rows quantities change. I also can think
of this change that happened recently on production, but cannot say for
sure if it is related or could affect you negatively:

https://phabricator.wikimedia.org/T139090

Maybe that affects your query and it is as easy to fix as reordering your
columns/changing slightly the filters or its order. If someone has a
suggestion to make it faster, that requires a change to labs, remember that
labs replicas are not static, and new indexes can be added if needed by the
community, and I will gladly apply them myself.

On Fri, Sep 23, 2016 at 6:02 PM, Huji Lee <huji.huji at gmail.com> wrote:

> With two connections to the DB via Terminal, it is possible! Here is what
> I got; it matches what you sent above too:
>
>
>
> MariaDB [fawiki_p]> show processlist;
> +---------+--------+-------------------+----------+---------
> +------+----------------------------+-----------------------
> ------------------------------------------------------------
> -------------------+----------+
> | Id      | User   | Host              | db       | Command | Time |
> State                      | Info
>                                                                        |
> Progress |
> +---------+--------+-------------------+----------+---------
> +------+----------------------------+-----------------------
> ------------------------------------------------------------
> -------------------+----------+
> | 3556437 | s51403 | 10.68.23.58:53391 | fawiki_p | Query   |   19 |
> Queried about 5450000 rows | select page_title, count(ll_lang) from page
> join category on page_title = cat_title left join catego |    0.000 |
> | 3556865 | s51403 | 10.68.23.58:54083 | fawiki_p | Query   |    0 |
> init                       | show processlist
>                                                                   |
> 0.000 |
> +---------+--------+-------------------+----------+---------
> +------+----------------------------+-----------------------
> ------------------------------------------------------------
> -------------------+----------+
> 2 rows in set (0.00 sec)
>
> MariaDB [fawiki_p]> show explain for 3556437;
> +------+-------------+---------------+--------+-------------
> ---------------------------+-----------------------------+--
> -------+------------------------+--------+------------------
> ----------------------------+
> | id   | select_type | table         | type   |
> possible_keys                          | key                         |
> key_len | ref                    | rows   | Extra
> |
> +------+-------------+---------------+--------+-------------
> ---------------------------+-----------------------------+--
> -------+------------------------+--------+------------------
> ----------------------------+
> |    1 | SIMPLE      | page          | ref    | name_title,page_redirect_namespace_len
> | page_redirect_namespace_len | 5       | const,const            | 364177 |
> Using where; Using temporary; Using filesort |
> |    1 | SIMPLE      | category      | eq_ref |
> cat_title                              | cat_title                   |
> 257     | fawiki.page.page_title |      1 | Using
> index                                  |
> |    1 | SIMPLE      | categorylinks | ref    |
> cl_timestamp,cl_sortkey                | cl_timestamp                |
> 257     | fawiki.page.page_title |     12 | Using where; Using
> index                     |
> |    1 | SIMPLE      | templatelinks | ref    |
> tl_from,tl_namespace                   | tl_namespace                |
> 4       | const                  | 390622 | Using where; Using
> index                     |
> |    1 | SIMPLE      | langlinks     | ref    |
> ll_from                                | ll_from                     |
> 4       | fawiki.page.page_id    | 104911 | Using
> index                                  |
> +------+-------------+---------------+--------+-------------
> ---------------------------+-----------------------------+--
> -------+------------------------+--------+------------------
> ----------------------------+
>
>
> On Fri, Sep 23, 2016 at 11:58 AM, Huji Lee <huji.huji at gmail.com> wrote:
>
>> Oh, I misread your comment.
>>
>> I doubt SHOW EXPLAIN would work via Quarry, so I am trying to get it to
>> work via terminal access to Labs. I wonder though if it works for a query
>> that fails to complete.
>>
>> On Fri, Sep 23, 2016 at 11:53 AM, Jaime Crespo <jcrespo at wikimedia.org>
>> wrote:
>>
>>> Huji, note I suggested SHOW EXPLAIN, not EXPLAIN; it is a different
>>> command, check the link I provided and report if that doesn't work for you.
>>>
>>> On Fri, Sep 23, 2016 at 5:31 PM, Huji Lee <huji.huji at gmail.com> wrote:
>>>
>>>> See https://quarry.wmflabs.org/query/12718 and the error message
>>>> returned.
>>>>
>>>> EXPLAIN only works if you have access to the underlying tables. I only
>>>> have access to the VIEWs built on top of actual wiki tables (for good
>>>> reason, same applies to most of us) so I get an error when I try EXPLAIN.
>>>>
>>>> On Fri, Sep 23, 2016 at 11:24 AM, Jaime Crespo <jcrespo at wikimedia.org>
>>>> wrote:
>>>>
>>>>> Explain (you should be able to run SHOW EXPLAIN on your own queries:
>>>>> http://s.petrunia.net/blog/?p=89):
>>>>>
>>>>> EXPLAIN select page_title, count(ll_lang) from page join category on
>>>>> page_title = cat_title left join categorylinks on page_title = cl_to left
>>>>> join templatelinks on tl_from = page_id and tl_namespace = 10 and tl_title
>>>>> in (   'رده_خالی' , 'رده_بهتر', 'رده_ابهام‌زدایی', 'رده_ردیابی‌کردن' ) left
>>>>> join langlinks on page_id = ll_from where page_namespace = 14 and
>>>>> page_is_redirect = 0 and cl_to is null and tl_title is null group by
>>>>> page_title order by 2, 1 limit 5000\G
>>>>> *************************** 1. row ***************************
>>>>>            id: 1
>>>>>   select_type: SIMPLE
>>>>>         table: page
>>>>>          type: ref
>>>>> possible_keys: name_title,page_redirect_namespace_len
>>>>>           key: page_redirect_namespace_len
>>>>>       key_len: 5
>>>>>           ref: const,const
>>>>>          rows: 364273
>>>>>         Extra: Using where; Using temporary; Using filesort
>>>>> *************************** 2. row ***************************
>>>>>            id: 1
>>>>>   select_type: SIMPLE
>>>>>         table: category
>>>>>          type: eq_ref
>>>>> possible_keys: cat_title
>>>>>           key: cat_title
>>>>>       key_len: 257
>>>>>           ref: fawiki.page.page_title
>>>>>          rows: 1
>>>>>         Extra: Using index
>>>>> *************************** 3. row ***************************
>>>>>            id: 1
>>>>>   select_type: SIMPLE
>>>>>         table: categorylinks
>>>>>          type: ref
>>>>> possible_keys: cl_timestamp,cl_sortkey
>>>>>           key: cl_timestamp
>>>>>       key_len: 257
>>>>>           ref: fawiki.page.page_title
>>>>>          rows: 12
>>>>>         Extra: Using where; Using index
>>>>> *************************** 4. row ***************************
>>>>>            id: 1
>>>>>   select_type: SIMPLE
>>>>>         table: templatelinks
>>>>>          type: ref
>>>>> possible_keys: tl_from,tl_namespace
>>>>>           key: tl_namespace
>>>>>       key_len: 4
>>>>>           ref: const
>>>>>          rows: 390610
>>>>>         Extra: Using where; Using index
>>>>> *************************** 5. row ***************************
>>>>>            id: 1
>>>>>   select_type: SIMPLE
>>>>>         table: langlinks
>>>>>          type: ref
>>>>> possible_keys: ll_from
>>>>>           key: ll_from
>>>>>       key_len: 4
>>>>>           ref: fawiki.page.page_id
>>>>>          rows: 104910
>>>>>         Extra: Using index
>>>>>
>>>>> On Fri, Sep 23, 2016 at 5:16 PM, Huji Lee <huji.huji at gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have a query [1] which I run weekly to identify empty categories;
>>>>>> those that don't have interwiki links and stay empty for a while are then
>>>>>> deleted by a sysop.
>>>>>>
>>>>>> The query ran just fine every week until Sep 2nd. [2] Since then, the
>>>>>> query times out and the page doesn't get updated.
>>>>>>
>>>>>> The query is no that complex (a SELECT with five JOINs) and used to
>>>>>> finish in about 3-5 minutes. I don't have EXPLAIN access on Labs so I
>>>>>> cannot tell what is slowing it down. Can someone kindly take a look and
>>>>>> advise why this suddenly stopped working?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Huji
>>>>>>
>>>>>> [1] https://quarry.wmflabs.org/query/3760
>>>>>> [2] https://fa.wikipedia.org/w/index.php?title=%D9%88%DB%8C%DA%A
>>>>>> 9%DB%8C%E2%80%8C%D9%BE%D8%AF%DB%8C%D8%A7:%DA%AF%D8%B2%D8%A7%
>>>>>> D8%B1%D8%B4_%D8%AF%DB%8C%D8%AA%D8%A7%D8%A8%DB%8C%D8%B3/%D8%B
>>>>>> 1%D8%AF%D9%87%E2%80%8C%D9%87%D8%A7%DB%8C_%D8%AE%D8%A7%D9%84%
>>>>>> DB%8C&action=history&uselang=en
>>>>>>
>>>>>> _______________________________________________
>>>>>> Labs-l mailing list
>>>>>> Labs-l at lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jaime Crespo
>>>>> <http://wikimedia.org>
>>>>>
>>>>> _______________________________________________
>>>>> Labs-l mailing list
>>>>> Labs-l at lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Labs-l mailing list
>>>> Labs-l at lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>
>>>>
>>>
>>>
>>> --
>>> Jaime Crespo
>>> <http://wikimedia.org>
>>>
>>> _______________________________________________
>>> Labs-l mailing list
>>> Labs-l at lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>
>>>
>>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>


-- 
Jaime Crespo
<http://wikimedia.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20160923/76fafc9f/attachment-0001.html>


More information about the Labs-l mailing list