[Labs-l] Empty Cat query

Bruce Myers bruce at brucemyers.com
Thu Sep 29 20:26:51 UTC 2016


The query is using the tl_namespace index (slow) instead of the tl_from 
index (fast). Since you can't force an index in a view, I tried removing 
the 'and tl_namespace = 10' qualifier and it ran in 40 seconds. It used 
the tl_from index. Since templatelinks usually point to the 10 namespace 
anyway, there shouldn't be to many, if any, false positives by leaving 
the qualifier out.

On 09/28/2016 03:03 PM, Huji Lee wrote:
> I wanted to revive this thread: can someone please help me find a way 
> to boost that query?
>
> On Fri, Sep 23, 2016 at 12:39 PM, Huji Lee <huji.huji at gmail.com 
> <mailto:huji.huji at gmail.com>> wrote:
>
>     Jaime, your hunch seems to be correct. I ran SHOW EXPLAIN on that
>     query every few minutes until it timed out, and the number rows it
>     was querying increased drastically. It very quickly reached
>     2399220000 <tel:2399220000>, with the last value I go before
>     timeout being 3693530000. That is just way too many.
>
>     T139090 could be the cause; it went into effect the day after last
>     successful query, and it does affect indexes used by the query.
>
>     One thing that can obviously make my query faster is to force it
>     to run the joins in a particular order. I tried to enforce it
>     through creating temporary tables in memory, but it failed:
>     https://quarry.wmflabs.org/query/12719
>     <https://quarry.wmflabs.org/query/12719>
>
>     Another approach would be to create indexes. However, my
>     permissions don't allow me to see what indexes currently exist (is
>     there a way around that?) so I cannot decide what new indexes can
>     be added.
>
>     On Fri, Sep 23, 2016 at 12:56 PM, Jaime Crespo
>     <jcrespo at wikimedia.org <mailto:jcrespo at wikimedia.org>> wrote:
>
>         So I cannot give you very specific advise, but based on the
>         EXPLAIN, it seems that you may be trying to read too many
>         rows, up to
>         100K * 300K * 300K (10 000 M) and then sort them, which is
>         probably going to be very slow.
>
>         If they used to work in the past there could be 2 probable
>         explanations: a lot of rows have been inserted recently on one
>         or several of the tables (e.g. a new template or category with
>         many members) or the indexes have changed. Note that redoing
>         the queries is something that we have to do constantly in
>         production because rows quantities change. I also can think of
>         this change that happened recently on production, but cannot
>         say for sure if it is related or could affect you negatively:
>
>         https://phabricator.wikimedia.org/T139090
>         <https://phabricator.wikimedia.org/T139090>
>
>         Maybe that affects your query and it is as easy to fix as
>         reordering your columns/changing slightly the filters or its
>         order. If someone has a suggestion to make it faster, that
>         requires a change to labs, remember that labs replicas are not
>         static, and new indexes can be added if needed by the
>         community, and I will gladly apply them myself.
>
>         On Fri, Sep 23, 2016 at 6:02 PM, Huji Lee <huji.huji at gmail.com
>         <mailto:huji.huji at gmail.com>> wrote:
>
>             With two connections to the DB via Terminal, it is
>             possible! Here is what I got; it matches what you sent
>             above too:
>
>
>
>             MariaDB [fawiki_p]> show processlist;
>             +---------+--------+-------------------+----------+---------+------+----------------------------+------------------------------------------------------------------------------------------------------+----------+
>             | Id      | User   | Host              | db       |
>             Command | Time | State                      | Info |
>             Progress |
>             +---------+--------+-------------------+----------+---------+------+----------------------------+------------------------------------------------------------------------------------------------------+----------+
>             | 3556437 | s51403 | 10.68.23.58:53391
>             <http://10.68.23.58:53391> | fawiki_p | Query   |   19 |
>             Queried about 5450000 rows | select page_title,
>             count(ll_lang) from page join category on page_title =
>             cat_title left join catego | 0.000 |
>             | 3556865 | s51403 | 10.68.23.58:54083
>             <http://10.68.23.58:54083> | fawiki_p | Query   |    0 |
>             init                       | show processlist |    0.000 |
>             +---------+--------+-------------------+----------+---------+------+----------------------------+------------------------------------------------------------------------------------------------------+----------+
>             2 rows in set (0.00 sec)
>
>             MariaDB [fawiki_p]> show explain for 3556437;
>             +------+-------------+---------------+--------+----------------------------------------+-----------------------------+---------+------------------------+--------+----------------------------------------------+
>             | id   | select_type | table         | type   |
>             possible_keys | key                         | key_len |
>             ref | rows   | Extra |
>             +------+-------------+---------------+--------+----------------------------------------+-----------------------------+---------+------------------------+--------+----------------------------------------------+
>             |    1 | SIMPLE      | page          | ref    |
>             name_title,page_redirect_namespace_len |
>             page_redirect_namespace_len | 5       | const,const |
>             364177 | Using where; Using temporary; Using filesort |
>             |    1 | SIMPLE      | category      | eq_ref | cat_title
>             | cat_title                   | 257     |
>             fawiki.page.page_title |      1 | Using index |
>             |    1 | SIMPLE      | categorylinks | ref    |
>             cl_timestamp,cl_sortkey | cl_timestamp                |
>             257     | fawiki.page.page_title |     12 | Using where;
>             Using index                     |
>             |    1 | SIMPLE      | templatelinks | ref    |
>             tl_from,tl_namespace | tl_namespace                |
>             4       | const | 390622 | Using where; Using
>             index                     |
>             |    1 | SIMPLE      | langlinks     | ref    | ll_from |
>             ll_from                     | 4       |
>             fawiki.page.page_id | 104911 | Using index |
>             +------+-------------+---------------+--------+----------------------------------------+-----------------------------+---------+------------------------+--------+----------------------------------------------+
>
>
>             On Fri, Sep 23, 2016 at 11:58 AM, Huji Lee
>             <huji.huji at gmail.com <mailto:huji.huji at gmail.com>> wrote:
>
>                 Oh, I misread your comment.
>
>                 I doubt SHOW EXPLAIN would work via Quarry, so I am
>                 trying to get it to work via terminal access to Labs.
>                 I wonder though if it works for a query that fails to
>                 complete.
>
>                 On Fri, Sep 23, 2016 at 11:53 AM, Jaime Crespo
>                 <jcrespo at wikimedia.org <mailto:jcrespo at wikimedia.org>>
>                 wrote:
>
>                     Huji, note I suggested SHOW EXPLAIN, not EXPLAIN;
>                     it is a different command, check the link I
>                     provided and report if that doesn't work for you.
>
>                     On Fri, Sep 23, 2016 at 5:31 PM, Huji Lee
>                     <huji.huji at gmail.com <mailto:huji.huji at gmail.com>>
>                     wrote:
>
>                         See https://quarry.wmflabs.org/query/12718
>                         <https://quarry.wmflabs.org/query/12718> and
>                         the error message returned.
>
>                         EXPLAIN only works if you have access to the
>                         underlying tables. I only have access to the
>                         VIEWs built on top of actual wiki tables (for
>                         good reason, same applies to most of us) so I
>                         get an error when I try EXPLAIN.
>
>                         On Fri, Sep 23, 2016 at 11:24 AM, Jaime Crespo
>                         <jcrespo at wikimedia.org
>                         <mailto:jcrespo at wikimedia.org>> wrote:
>
>                             Explain (you should be able to run SHOW
>                             EXPLAIN on your own queries:
>                             http://s.petrunia.net/blog/?p=89
>                             <http://s.petrunia.net/blog/?p=89>):
>
>                             EXPLAIN select page_title, count(ll_lang)
>                             from page join category on page_title =
>                             cat_title left join categorylinks on
>                             page_title = cl_to left join templatelinks
>                             on tl_from = page_id and tl_namespace = 10
>                             and tl_title in (   'رده_خالی' ,
>                             'رده_بهتر', 'رده_ابهام‌زدایی',
>                             'رده_ردیابی‌کردن' ) left join langlinks on
>                             page_id = ll_from where page_namespace =
>                             14 and page_is_redirect = 0 and cl_to is
>                             null and tl_title is null group by
>                             page_title order by 2, 1 limit 5000\G
>                             *************************** 1. row
>                             ***************************
>                                        id: 1
>                               select_type: SIMPLE
>                                     table: page
>                                      type: ref
>                             possible_keys:
>                             name_title,page_redirect_namespace_len
>                                       key: page_redirect_namespace_len
>                                   key_len: 5
>                                       ref: const,const
>                                      rows: 364273
>                                     Extra: Using where; Using
>                             temporary; Using filesort
>                             *************************** 2. row
>                             ***************************
>                                        id: 1
>                               select_type: SIMPLE
>                                     table: category
>                                      type: eq_ref
>                             possible_keys: cat_title
>                                       key: cat_title
>                                   key_len: 257
>                                       ref: fawiki.page.page_title
>                                      rows: 1
>                                     Extra: Using index
>                             *************************** 3. row
>                             ***************************
>                                        id: 1
>                               select_type: SIMPLE
>                                     table: categorylinks
>                                      type: ref
>                             possible_keys: cl_timestamp,cl_sortkey
>                                       key: cl_timestamp
>                                   key_len: 257
>                                       ref: fawiki.page.page_title
>                                      rows: 12
>                                     Extra: Using where; Using index
>                             *************************** 4. row
>                             ***************************
>                                        id: 1
>                               select_type: SIMPLE
>                                     table: templatelinks
>                                      type: ref
>                             possible_keys: tl_from,tl_namespace
>                                       key: tl_namespace
>                                   key_len: 4
>                                       ref: const
>                                      rows: 390610
>                                     Extra: Using where; Using index
>                             *************************** 5. row
>                             ***************************
>                                        id: 1
>                               select_type: SIMPLE
>                                     table: langlinks
>                                      type: ref
>                             possible_keys: ll_from
>                                       key: ll_from
>                                   key_len: 4
>                                       ref: fawiki.page.page_id
>                                      rows: 104910
>                                     Extra: Using index
>
>                             On Fri, Sep 23, 2016 at 5:16 PM, Huji Lee
>                             <huji.huji at gmail.com
>                             <mailto:huji.huji at gmail.com>> wrote:
>
>                                 Hi all,
>
>                                 I have a query [1] which I run weekly
>                                 to identify empty categories; those
>                                 that don't have interwiki links and
>                                 stay empty for a while are then
>                                 deleted by a sysop.
>
>                                 The query ran just fine every week
>                                 until Sep 2nd. [2] Since then, the
>                                 query times out and the page doesn't
>                                 get updated.
>
>                                 The query is no that complex (a SELECT
>                                 with five JOINs) and used to finish in
>                                 about 3-5 minutes. I don't have
>                                 EXPLAIN access on Labs so I cannot
>                                 tell what is slowing it down. Can
>                                 someone kindly take a look and advise
>                                 why this suddenly stopped working?
>
>                                 Thanks,
>
>                                 Huji
>
>                                 [1]
>                                 https://quarry.wmflabs.org/query/3760
>                                 <https://quarry.wmflabs.org/query/3760>
>                                 [2]
>                                 https://fa.wikipedia.org/w/index.php?title=%D9%88%DB%8C%DA%A9%DB%8C%E2%80%8C%D9%BE%D8%AF%DB%8C%D8%A7:%DA%AF%D8%B2%D8%A7%D8%B1%D8%B4_%D8%AF%DB%8C%D8%AA%D8%A7%D8%A8%DB%8C%D8%B3/%D8%B1%D8%AF%D9%87%E2%80%8C%D9%87%D8%A7%DB%8C_%D8%AE%D8%A7%D9%84%DB%8C&action=history&uselang=en
>                                 <https://fa.wikipedia.org/w/index.php?title=%D9%88%DB%8C%DA%A9%DB%8C%E2%80%8C%D9%BE%D8%AF%DB%8C%D8%A7:%DA%AF%D8%B2%D8%A7%D8%B1%D8%B4_%D8%AF%DB%8C%D8%AA%D8%A7%D8%A8%DB%8C%D8%B3/%D8%B1%D8%AF%D9%87%E2%80%8C%D9%87%D8%A7%DB%8C_%D8%AE%D8%A7%D9%84%DB%8C&action=history&uselang=en>
>
>                                 _______________________________________________
>                                 Labs-l mailing list
>                                 Labs-l at lists.wikimedia.org
>                                 <mailto:Labs-l at lists.wikimedia.org>
>                                 https://lists.wikimedia.org/mailman/listinfo/labs-l
>                                 <https://lists.wikimedia.org/mailman/listinfo/labs-l>
>
>
>
>
>                             -- 
>                             Jaime Crespo
>                             <http://wikimedia.org>
>
>                             _______________________________________________
>                             Labs-l mailing list
>                             Labs-l at lists.wikimedia.org
>                             <mailto:Labs-l at lists.wikimedia.org>
>                             https://lists.wikimedia.org/mailman/listinfo/labs-l
>                             <https://lists.wikimedia.org/mailman/listinfo/labs-l>
>
>
>
>                         _______________________________________________
>                         Labs-l mailing list
>                         Labs-l at lists.wikimedia.org
>                         <mailto:Labs-l at lists.wikimedia.org>
>                         https://lists.wikimedia.org/mailman/listinfo/labs-l
>                         <https://lists.wikimedia.org/mailman/listinfo/labs-l>
>
>
>
>
>                     -- 
>                     Jaime Crespo
>                     <http://wikimedia.org>
>
>                     _______________________________________________
>                     Labs-l mailing list
>                     Labs-l at lists.wikimedia.org
>                     <mailto:Labs-l at lists.wikimedia.org>
>                     https://lists.wikimedia.org/mailman/listinfo/labs-l
>                     <https://lists.wikimedia.org/mailman/listinfo/labs-l>
>
>
>
>
>             _______________________________________________
>             Labs-l mailing list
>             Labs-l at lists.wikimedia.org <mailto:Labs-l at lists.wikimedia.org>
>             https://lists.wikimedia.org/mailman/listinfo/labs-l
>             <https://lists.wikimedia.org/mailman/listinfo/labs-l>
>
>
>
>
>         -- 
>         Jaime Crespo
>         <http://wikimedia.org>
>
>         _______________________________________________
>         Labs-l mailing list
>         Labs-l at lists.wikimedia.org <mailto:Labs-l at lists.wikimedia.org>
>         https://lists.wikimedia.org/mailman/listinfo/labs-l
>         <https://lists.wikimedia.org/mailman/listinfo/labs-l>
>
>
>
>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20160929/08814746/attachment-0001.html>


More information about the Labs-l mailing list