[Labs-l] Empty Cat query
Bruce Myers
bruce at brucemyers.com
Thu Sep 29 20:26:51 UTC 2016
The query is using the tl_namespace index (slow) instead of the tl_from
index (fast). Since you can't force an index in a view, I tried removing
the 'and tl_namespace = 10' qualifier and it ran in 40 seconds. It used
the tl_from index. Since templatelinks usually point to the 10 namespace
anyway, there shouldn't be to many, if any, false positives by leaving
the qualifier out.
On 09/28/2016 03:03 PM, Huji Lee wrote:
> I wanted to revive this thread: can someone please help me find a way
> to boost that query?
>
> On Fri, Sep 23, 2016 at 12:39 PM, Huji Lee <huji.huji at gmail.com
> <mailto:huji.huji at gmail.com>> wrote:
>
> Jaime, your hunch seems to be correct. I ran SHOW EXPLAIN on that
> query every few minutes until it timed out, and the number rows it
> was querying increased drastically. It very quickly reached
> 2399220000 <tel:2399220000>, with the last value I go before
> timeout being 3693530000. That is just way too many.
>
> T139090 could be the cause; it went into effect the day after last
> successful query, and it does affect indexes used by the query.
>
> One thing that can obviously make my query faster is to force it
> to run the joins in a particular order. I tried to enforce it
> through creating temporary tables in memory, but it failed:
> https://quarry.wmflabs.org/query/12719
> <https://quarry.wmflabs.org/query/12719>
>
> Another approach would be to create indexes. However, my
> permissions don't allow me to see what indexes currently exist (is
> there a way around that?) so I cannot decide what new indexes can
> be added.
>
> On Fri, Sep 23, 2016 at 12:56 PM, Jaime Crespo
> <jcrespo at wikimedia.org <mailto:jcrespo at wikimedia.org>> wrote:
>
> So I cannot give you very specific advise, but based on the
> EXPLAIN, it seems that you may be trying to read too many
> rows, up to
> 100K * 300K * 300K (10 000 M) and then sort them, which is
> probably going to be very slow.
>
> If they used to work in the past there could be 2 probable
> explanations: a lot of rows have been inserted recently on one
> or several of the tables (e.g. a new template or category with
> many members) or the indexes have changed. Note that redoing
> the queries is something that we have to do constantly in
> production because rows quantities change. I also can think of
> this change that happened recently on production, but cannot
> say for sure if it is related or could affect you negatively:
>
> https://phabricator.wikimedia.org/T139090
> <https://phabricator.wikimedia.org/T139090>
>
> Maybe that affects your query and it is as easy to fix as
> reordering your columns/changing slightly the filters or its
> order. If someone has a suggestion to make it faster, that
> requires a change to labs, remember that labs replicas are not
> static, and new indexes can be added if needed by the
> community, and I will gladly apply them myself.
>
> On Fri, Sep 23, 2016 at 6:02 PM, Huji Lee <huji.huji at gmail.com
> <mailto:huji.huji at gmail.com>> wrote:
>
> With two connections to the DB via Terminal, it is
> possible! Here is what I got; it matches what you sent
> above too:
>
>
>
> MariaDB [fawiki_p]> show processlist;
> +---------+--------+-------------------+----------+---------+------+----------------------------+------------------------------------------------------------------------------------------------------+----------+
> | Id | User | Host | db |
> Command | Time | State | Info |
> Progress |
> +---------+--------+-------------------+----------+---------+------+----------------------------+------------------------------------------------------------------------------------------------------+----------+
> | 3556437 | s51403 | 10.68.23.58:53391
> <http://10.68.23.58:53391> | fawiki_p | Query | 19 |
> Queried about 5450000 rows | select page_title,
> count(ll_lang) from page join category on page_title =
> cat_title left join catego | 0.000 |
> | 3556865 | s51403 | 10.68.23.58:54083
> <http://10.68.23.58:54083> | fawiki_p | Query | 0 |
> init | show processlist | 0.000 |
> +---------+--------+-------------------+----------+---------+------+----------------------------+------------------------------------------------------------------------------------------------------+----------+
> 2 rows in set (0.00 sec)
>
> MariaDB [fawiki_p]> show explain for 3556437;
> +------+-------------+---------------+--------+----------------------------------------+-----------------------------+---------+------------------------+--------+----------------------------------------------+
> | id | select_type | table | type |
> possible_keys | key | key_len |
> ref | rows | Extra |
> +------+-------------+---------------+--------+----------------------------------------+-----------------------------+---------+------------------------+--------+----------------------------------------------+
> | 1 | SIMPLE | page | ref |
> name_title,page_redirect_namespace_len |
> page_redirect_namespace_len | 5 | const,const |
> 364177 | Using where; Using temporary; Using filesort |
> | 1 | SIMPLE | category | eq_ref | cat_title
> | cat_title | 257 |
> fawiki.page.page_title | 1 | Using index |
> | 1 | SIMPLE | categorylinks | ref |
> cl_timestamp,cl_sortkey | cl_timestamp |
> 257 | fawiki.page.page_title | 12 | Using where;
> Using index |
> | 1 | SIMPLE | templatelinks | ref |
> tl_from,tl_namespace | tl_namespace |
> 4 | const | 390622 | Using where; Using
> index |
> | 1 | SIMPLE | langlinks | ref | ll_from |
> ll_from | 4 |
> fawiki.page.page_id | 104911 | Using index |
> +------+-------------+---------------+--------+----------------------------------------+-----------------------------+---------+------------------------+--------+----------------------------------------------+
>
>
> On Fri, Sep 23, 2016 at 11:58 AM, Huji Lee
> <huji.huji at gmail.com <mailto:huji.huji at gmail.com>> wrote:
>
> Oh, I misread your comment.
>
> I doubt SHOW EXPLAIN would work via Quarry, so I am
> trying to get it to work via terminal access to Labs.
> I wonder though if it works for a query that fails to
> complete.
>
> On Fri, Sep 23, 2016 at 11:53 AM, Jaime Crespo
> <jcrespo at wikimedia.org <mailto:jcrespo at wikimedia.org>>
> wrote:
>
> Huji, note I suggested SHOW EXPLAIN, not EXPLAIN;
> it is a different command, check the link I
> provided and report if that doesn't work for you.
>
> On Fri, Sep 23, 2016 at 5:31 PM, Huji Lee
> <huji.huji at gmail.com <mailto:huji.huji at gmail.com>>
> wrote:
>
> See https://quarry.wmflabs.org/query/12718
> <https://quarry.wmflabs.org/query/12718> and
> the error message returned.
>
> EXPLAIN only works if you have access to the
> underlying tables. I only have access to the
> VIEWs built on top of actual wiki tables (for
> good reason, same applies to most of us) so I
> get an error when I try EXPLAIN.
>
> On Fri, Sep 23, 2016 at 11:24 AM, Jaime Crespo
> <jcrespo at wikimedia.org
> <mailto:jcrespo at wikimedia.org>> wrote:
>
> Explain (you should be able to run SHOW
> EXPLAIN on your own queries:
> http://s.petrunia.net/blog/?p=89
> <http://s.petrunia.net/blog/?p=89>):
>
> EXPLAIN select page_title, count(ll_lang)
> from page join category on page_title =
> cat_title left join categorylinks on
> page_title = cl_to left join templatelinks
> on tl_from = page_id and tl_namespace = 10
> and tl_title in ( 'رده_خالی' ,
> 'رده_بهتر', 'رده_ابهامزدایی',
> 'رده_ردیابیکردن' ) left join langlinks on
> page_id = ll_from where page_namespace =
> 14 and page_is_redirect = 0 and cl_to is
> null and tl_title is null group by
> page_title order by 2, 1 limit 5000\G
> *************************** 1. row
> ***************************
> id: 1
> select_type: SIMPLE
> table: page
> type: ref
> possible_keys:
> name_title,page_redirect_namespace_len
> key: page_redirect_namespace_len
> key_len: 5
> ref: const,const
> rows: 364273
> Extra: Using where; Using
> temporary; Using filesort
> *************************** 2. row
> ***************************
> id: 1
> select_type: SIMPLE
> table: category
> type: eq_ref
> possible_keys: cat_title
> key: cat_title
> key_len: 257
> ref: fawiki.page.page_title
> rows: 1
> Extra: Using index
> *************************** 3. row
> ***************************
> id: 1
> select_type: SIMPLE
> table: categorylinks
> type: ref
> possible_keys: cl_timestamp,cl_sortkey
> key: cl_timestamp
> key_len: 257
> ref: fawiki.page.page_title
> rows: 12
> Extra: Using where; Using index
> *************************** 4. row
> ***************************
> id: 1
> select_type: SIMPLE
> table: templatelinks
> type: ref
> possible_keys: tl_from,tl_namespace
> key: tl_namespace
> key_len: 4
> ref: const
> rows: 390610
> Extra: Using where; Using index
> *************************** 5. row
> ***************************
> id: 1
> select_type: SIMPLE
> table: langlinks
> type: ref
> possible_keys: ll_from
> key: ll_from
> key_len: 4
> ref: fawiki.page.page_id
> rows: 104910
> Extra: Using index
>
> On Fri, Sep 23, 2016 at 5:16 PM, Huji Lee
> <huji.huji at gmail.com
> <mailto:huji.huji at gmail.com>> wrote:
>
> Hi all,
>
> I have a query [1] which I run weekly
> to identify empty categories; those
> that don't have interwiki links and
> stay empty for a while are then
> deleted by a sysop.
>
> The query ran just fine every week
> until Sep 2nd. [2] Since then, the
> query times out and the page doesn't
> get updated.
>
> The query is no that complex (a SELECT
> with five JOINs) and used to finish in
> about 3-5 minutes. I don't have
> EXPLAIN access on Labs so I cannot
> tell what is slowing it down. Can
> someone kindly take a look and advise
> why this suddenly stopped working?
>
> Thanks,
>
> Huji
>
> [1]
> https://quarry.wmflabs.org/query/3760
> <https://quarry.wmflabs.org/query/3760>
> [2]
> https://fa.wikipedia.org/w/index.php?title=%D9%88%DB%8C%DA%A9%DB%8C%E2%80%8C%D9%BE%D8%AF%DB%8C%D8%A7:%DA%AF%D8%B2%D8%A7%D8%B1%D8%B4_%D8%AF%DB%8C%D8%AA%D8%A7%D8%A8%DB%8C%D8%B3/%D8%B1%D8%AF%D9%87%E2%80%8C%D9%87%D8%A7%DB%8C_%D8%AE%D8%A7%D9%84%DB%8C&action=history&uselang=en
> <https://fa.wikipedia.org/w/index.php?title=%D9%88%DB%8C%DA%A9%DB%8C%E2%80%8C%D9%BE%D8%AF%DB%8C%D8%A7:%DA%AF%D8%B2%D8%A7%D8%B1%D8%B4_%D8%AF%DB%8C%D8%AA%D8%A7%D8%A8%DB%8C%D8%B3/%D8%B1%D8%AF%D9%87%E2%80%8C%D9%87%D8%A7%DB%8C_%D8%AE%D8%A7%D9%84%DB%8C&action=history&uselang=en>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> <mailto:Labs-l at lists.wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/labs-l
> <https://lists.wikimedia.org/mailman/listinfo/labs-l>
>
>
>
>
> --
> Jaime Crespo
> <http://wikimedia.org>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> <mailto:Labs-l at lists.wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/labs-l
> <https://lists.wikimedia.org/mailman/listinfo/labs-l>
>
>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> <mailto:Labs-l at lists.wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/labs-l
> <https://lists.wikimedia.org/mailman/listinfo/labs-l>
>
>
>
>
> --
> Jaime Crespo
> <http://wikimedia.org>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> <mailto:Labs-l at lists.wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/labs-l
> <https://lists.wikimedia.org/mailman/listinfo/labs-l>
>
>
>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org <mailto:Labs-l at lists.wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/labs-l
> <https://lists.wikimedia.org/mailman/listinfo/labs-l>
>
>
>
>
> --
> Jaime Crespo
> <http://wikimedia.org>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org <mailto:Labs-l at lists.wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/labs-l
> <https://lists.wikimedia.org/mailman/listinfo/labs-l>
>
>
>
>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20160929/08814746/attachment-0001.html>
More information about the Labs-l
mailing list