[Labs-l] Empty Cat query

Huji Lee huji.huji at gmail.com
Fri Sep 23 17:39:54 UTC 2016


Jaime, your hunch seems to be correct. I ran SHOW EXPLAIN on that query
every few minutes until it timed out, and the number rows it was querying
increased drastically. It very quickly reached 2399220000, with the last
value I go before timeout being 3693530000. That is just way too many.

T139090 could be the cause; it went into effect the day after last
successful query, and it does affect indexes used by the query.

One thing that can obviously make my query faster is to force it to run the
joins in a particular order. I tried to enforce it through creating
temporary tables in memory, but it failed:
https://quarry.wmflabs.org/query/12719

Another approach would be to create indexes. However, my permissions don't
allow me to see what indexes currently exist (is there a way around that?)
so I cannot decide what new indexes can be added.

On Fri, Sep 23, 2016 at 12:56 PM, Jaime Crespo <jcrespo at wikimedia.org>
wrote:

> So I cannot give you very specific advise, but based on the EXPLAIN, it
> seems that you may be trying to read too many rows, up to
> 100K * 300K * 300K (10 000 M) and then sort them, which is probably going
> to be very slow.
>
> If they used to work in the past there could be 2 probable explanations: a
> lot of rows have been inserted recently on one or several of the tables
> (e.g. a new template or category with many members) or the indexes have
> changed. Note that redoing the queries is something that we have to do
> constantly in production because rows quantities change. I also can think
> of this change that happened recently on production, but cannot say for
> sure if it is related or could affect you negatively:
>
> https://phabricator.wikimedia.org/T139090
>
> Maybe that affects your query and it is as easy to fix as reordering your
> columns/changing slightly the filters or its order. If someone has a
> suggestion to make it faster, that requires a change to labs, remember that
> labs replicas are not static, and new indexes can be added if needed by the
> community, and I will gladly apply them myself.
>
> On Fri, Sep 23, 2016 at 6:02 PM, Huji Lee <huji.huji at gmail.com> wrote:
>
>> With two connections to the DB via Terminal, it is possible! Here is what
>> I got; it matches what you sent above too:
>>
>>
>>
>> MariaDB [fawiki_p]> show processlist;
>> +---------+--------+-------------------+----------+---------
>> +------+----------------------------+-----------------------
>> ------------------------------------------------------------
>> -------------------+----------+
>> | Id      | User   | Host              | db       | Command | Time |
>> State                      | Info
>>                                                                        |
>> Progress |
>> +---------+--------+-------------------+----------+---------
>> +------+----------------------------+-----------------------
>> ------------------------------------------------------------
>> -------------------+----------+
>> | 3556437 | s51403 | 10.68.23.58:53391 | fawiki_p | Query   |   19 |
>> Queried about 5450000 rows | select page_title, count(ll_lang) from page
>> join category on page_title = cat_title left join catego |    0.000 |
>> | 3556865 | s51403 | 10.68.23.58:54083 | fawiki_p | Query   |    0 |
>> init                       | show processlist
>>                                                                   |
>> 0.000 |
>> +---------+--------+-------------------+----------+---------
>> +------+----------------------------+-----------------------
>> ------------------------------------------------------------
>> -------------------+----------+
>> 2 rows in set (0.00 sec)
>>
>> MariaDB [fawiki_p]> show explain for 3556437;
>> +------+-------------+---------------+--------+-------------
>> ---------------------------+-----------------------------+--
>> -------+------------------------+--------+------------------
>> ----------------------------+
>> | id   | select_type | table         | type   |
>> possible_keys                          | key                         |
>> key_len | ref                    | rows   | Extra
>> |
>> +------+-------------+---------------+--------+-------------
>> ---------------------------+-----------------------------+--
>> -------+------------------------+--------+------------------
>> ----------------------------+
>> |    1 | SIMPLE      | page          | ref    |
>> name_title,page_redirect_namespace_len | page_redirect_namespace_len |
>> 5       | const,const            | 364177 | Using where; Using temporary;
>> Using filesort |
>> |    1 | SIMPLE      | category      | eq_ref |
>> cat_title                              | cat_title                   |
>> 257     | fawiki.page.page_title |      1 | Using
>> index                                  |
>> |    1 | SIMPLE      | categorylinks | ref    |
>> cl_timestamp,cl_sortkey                | cl_timestamp                |
>> 257     | fawiki.page.page_title |     12 | Using where; Using
>> index                     |
>> |    1 | SIMPLE      | templatelinks | ref    |
>> tl_from,tl_namespace                   | tl_namespace                |
>> 4       | const                  | 390622 | Using where; Using
>> index                     |
>> |    1 | SIMPLE      | langlinks     | ref    |
>> ll_from                                | ll_from                     |
>> 4       | fawiki.page.page_id    | 104911 | Using
>> index                                  |
>> +------+-------------+---------------+--------+-------------
>> ---------------------------+-----------------------------+--
>> -------+------------------------+--------+------------------
>> ----------------------------+
>>
>>
>> On Fri, Sep 23, 2016 at 11:58 AM, Huji Lee <huji.huji at gmail.com> wrote:
>>
>>> Oh, I misread your comment.
>>>
>>> I doubt SHOW EXPLAIN would work via Quarry, so I am trying to get it to
>>> work via terminal access to Labs. I wonder though if it works for a query
>>> that fails to complete.
>>>
>>> On Fri, Sep 23, 2016 at 11:53 AM, Jaime Crespo <jcrespo at wikimedia.org>
>>> wrote:
>>>
>>>> Huji, note I suggested SHOW EXPLAIN, not EXPLAIN; it is a different
>>>> command, check the link I provided and report if that doesn't work for you.
>>>>
>>>> On Fri, Sep 23, 2016 at 5:31 PM, Huji Lee <huji.huji at gmail.com> wrote:
>>>>
>>>>> See https://quarry.wmflabs.org/query/12718 and the error message
>>>>> returned.
>>>>>
>>>>> EXPLAIN only works if you have access to the underlying tables. I only
>>>>> have access to the VIEWs built on top of actual wiki tables (for good
>>>>> reason, same applies to most of us) so I get an error when I try EXPLAIN.
>>>>>
>>>>> On Fri, Sep 23, 2016 at 11:24 AM, Jaime Crespo <jcrespo at wikimedia.org>
>>>>> wrote:
>>>>>
>>>>>> Explain (you should be able to run SHOW EXPLAIN on your own queries:
>>>>>> http://s.petrunia.net/blog/?p=89):
>>>>>>
>>>>>> EXPLAIN select page_title, count(ll_lang) from page join category on
>>>>>> page_title = cat_title left join categorylinks on page_title = cl_to left
>>>>>> join templatelinks on tl_from = page_id and tl_namespace = 10 and tl_title
>>>>>> in (   'رده_خالی' , 'رده_بهتر', 'رده_ابهام‌زدایی', 'رده_ردیابی‌کردن' ) left
>>>>>> join langlinks on page_id = ll_from where page_namespace = 14 and
>>>>>> page_is_redirect = 0 and cl_to is null and tl_title is null group by
>>>>>> page_title order by 2, 1 limit 5000\G
>>>>>> *************************** 1. row ***************************
>>>>>>            id: 1
>>>>>>   select_type: SIMPLE
>>>>>>         table: page
>>>>>>          type: ref
>>>>>> possible_keys: name_title,page_redirect_namespace_len
>>>>>>           key: page_redirect_namespace_len
>>>>>>       key_len: 5
>>>>>>           ref: const,const
>>>>>>          rows: 364273
>>>>>>         Extra: Using where; Using temporary; Using filesort
>>>>>> *************************** 2. row ***************************
>>>>>>            id: 1
>>>>>>   select_type: SIMPLE
>>>>>>         table: category
>>>>>>          type: eq_ref
>>>>>> possible_keys: cat_title
>>>>>>           key: cat_title
>>>>>>       key_len: 257
>>>>>>           ref: fawiki.page.page_title
>>>>>>          rows: 1
>>>>>>         Extra: Using index
>>>>>> *************************** 3. row ***************************
>>>>>>            id: 1
>>>>>>   select_type: SIMPLE
>>>>>>         table: categorylinks
>>>>>>          type: ref
>>>>>> possible_keys: cl_timestamp,cl_sortkey
>>>>>>           key: cl_timestamp
>>>>>>       key_len: 257
>>>>>>           ref: fawiki.page.page_title
>>>>>>          rows: 12
>>>>>>         Extra: Using where; Using index
>>>>>> *************************** 4. row ***************************
>>>>>>            id: 1
>>>>>>   select_type: SIMPLE
>>>>>>         table: templatelinks
>>>>>>          type: ref
>>>>>> possible_keys: tl_from,tl_namespace
>>>>>>           key: tl_namespace
>>>>>>       key_len: 4
>>>>>>           ref: const
>>>>>>          rows: 390610
>>>>>>         Extra: Using where; Using index
>>>>>> *************************** 5. row ***************************
>>>>>>            id: 1
>>>>>>   select_type: SIMPLE
>>>>>>         table: langlinks
>>>>>>          type: ref
>>>>>> possible_keys: ll_from
>>>>>>           key: ll_from
>>>>>>       key_len: 4
>>>>>>           ref: fawiki.page.page_id
>>>>>>          rows: 104910
>>>>>>         Extra: Using index
>>>>>>
>>>>>> On Fri, Sep 23, 2016 at 5:16 PM, Huji Lee <huji.huji at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have a query [1] which I run weekly to identify empty categories;
>>>>>>> those that don't have interwiki links and stay empty for a while are then
>>>>>>> deleted by a sysop.
>>>>>>>
>>>>>>> The query ran just fine every week until Sep 2nd. [2] Since then,
>>>>>>> the query times out and the page doesn't get updated.
>>>>>>>
>>>>>>> The query is no that complex (a SELECT with five JOINs) and used to
>>>>>>> finish in about 3-5 minutes. I don't have EXPLAIN access on Labs so I
>>>>>>> cannot tell what is slowing it down. Can someone kindly take a look and
>>>>>>> advise why this suddenly stopped working?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Huji
>>>>>>>
>>>>>>> [1] https://quarry.wmflabs.org/query/3760
>>>>>>> [2] https://fa.wikipedia.org/w/index.php?title=%D9%88%DB%8C%DA%A
>>>>>>> 9%DB%8C%E2%80%8C%D9%BE%D8%AF%DB%8C%D8%A7:%DA%AF%D8%B2%D8%A7%
>>>>>>> D8%B1%D8%B4_%D8%AF%DB%8C%D8%AA%D8%A7%D8%A8%DB%8C%D8%B3/%D8%B
>>>>>>> 1%D8%AF%D9%87%E2%80%8C%D9%87%D8%A7%DB%8C_%D8%AE%D8%A7%D9%84%
>>>>>>> DB%8C&action=history&uselang=en
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Labs-l mailing list
>>>>>>> Labs-l at lists.wikimedia.org
>>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jaime Crespo
>>>>>> <http://wikimedia.org>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Labs-l mailing list
>>>>>> Labs-l at lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Labs-l mailing list
>>>>> Labs-l at lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Jaime Crespo
>>>> <http://wikimedia.org>
>>>>
>>>> _______________________________________________
>>>> Labs-l mailing list
>>>> Labs-l at lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Labs-l mailing list
>> Labs-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>
>>
>
>
> --
> Jaime Crespo
> <http://wikimedia.org>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20160923/b72a237c/attachment-0001.html>


More information about the Labs-l mailing list