[Labs-l] long running queries

Jaime Crespo jcrespo at wikimedia.org
Sat Sep 23 17:16:48 UTC 2017


John,

Despite your query returning only 50 results, it is unlikely to finish- ever :-)

Your query is reading (almost) every single row of pagelinks (1
billion rows), -which is also by itself not very optimized-, and
joining it with page (35 million rows), and then sorting all results,
all without using a proper index, then discarding all of them except
the top 50.

I would wonder if you have considered using the cached functionality
found at https://en.wikipedia.org/wiki/Special:WantedPages but if you
absolutely have to run your query? Despite the wrong overriden
mediawiki messages show there, those are updated twice a month: This
is the same list as shown there on namespace 0:

root at dbstore1001[enwiki]> SELECT qc_title as page, qc_value as
numlinks  FROM querycache WHERE qc_type =
'Wantedpages' and qc_namespace=0 ORDER BY qc_value DESC LIMIT 50;
+--------------------------------------------+----------+
| page                                       | numlinks |
+--------------------------------------------+----------+
| HaRav_Moshe_Nehemia_Cohenov                |    19729 |
| HaRav_Meir_Shehenboim_(Etz_Hadar)          |    19728 |
| Wrestling_at_the_2010_South_American_Games |    12659 |
| Post-structuralist_ballet                  |    12025 |
| Robert_MacDonnell                          |    10541 |
| Charles_Michael_Lavery_QC                  |    10540 |
| Alison_Campbell                            |     9747 |
| Shauna_Gunn                                |     9744 |
| Sophie_Hoopman                             |     9742 |
| District_Policing_Partnerships             |     9741 |
| Stanley_Leisure                            |     9740 |
| Robert_Hall_(doctor)                       |     9739 |
| Hunt_Committee                             |     9738 |
| Brian_Baird_(newsreader)                   |     9738 |
| Choo_Chiat_Goh                             |     7483 |
| Michael_Cook_(ballet)                      |     7480 |
| Zoulikha_Naciri                            |     7127 |
| Mounir_Charïbi                             |     7127 |
| Mohamed_Moatassim                          |     7127 |
| Khalil_Hachimi_Idrissi                     |     7127 |
| Hassan_Aourid                              |     7127 |
| Faycal_Laraichi                            |     7127 |
| Campocraspedon                             |     6982 |
| Al-Masdar                                  |     6811 |
| Mackay_Radio                               |     6809 |
| Lebanese_literature                        |     6042 |
| Al_Nabad                                   |     6007 |
| Badran_Roy_Badran                          |     6006 |
| Mohammad_Abdul_Jabbar_Khairy               |     6001 |
| Toufic_Awwad                               |     5999 |
| Maurice_Chehab                             |     5999 |
| Ghazi_Aad                                  |     5999 |
| Kawasaki_Vulcan_500_LTD                    |     5724 |
| Buddhism_in_Cuba                           |     5706 |
| Thomas_steam_bicycle                       |     5698 |
| Cementerio_de_Santa_Ifigenia               |     5696 |
| Cuban_Liberation_Army                      |     5695 |
| Fidel_Castro_Díaz-Balart                   |     5694 |
| Consejo_popular_(Cuba)                     |     5694 |
| Moto_Guzzi_V7                              |     5687 |
| Dirt_Rider                                 |     5676 |
| Dick_O'Brien                               |     5673 |
| Motus_Motorcycles                          |     5671 |
| Greenhorn_Enduro                           |     5671 |
| Werner_Motocyclette                        |     5670 |
| Vietnam_Veterans_Motorcycle_Club           |     5670 |
| Moto_Guzzi_Eldorado                        |     5670 |
| Moto_Guzzi_Ambassador                      |     5670 |
| Bimota_Tesi                                |     5670 |
| Sara_Liberte                               |     5669 |
+--------------------------------------------+----------+
50 rows in set (0.00 sec)

Last updated 11 days ago:
root at dbstore1001[enwiki]> SELECT * FROM querycache_info WHERE
qci_type='Wantedpages';
+-------------+----------------+
| qci_type    | qci_timestamp  |
+-------------+----------------+
| Wantedpages | 20170912074914 |
+-------------+----------------+
1 row in set (0.04 sec)

This table, querycache, could be replicated to the wiki replicas, but
we need someone to help with the sanitization (remove the unwatched
pages info).

On Sat, Sep 23, 2017 at 2:40 AM, John <phoenixoverride at gmail.com> wrote:
> The query I am using in this case is:
>
> select pl_title, count(*) from pagelinks  left join page on pl_title =
> page_title  where pl_namespace = 0  and pl_from_namespace  = 0 and page_id
> IS NULL  group by pl_title having COUNT(*) > 100 LIMIT 50;
>
> I am also running this on enwiki where the table is probably the largest. I
> am attempting to locate red linked pages via number of incoming links. I
> would expect it to take an hour or more to run probably, given the size of
> the table. But I am getting the error message in the phab ticket: ERROR 2006
> (HY000): MySQL server has gone away
>
> On Fri, Sep 22, 2017 at 7:53 PM, Madhumitha Viswanathan
> <mviswanathan at wikimedia.org> wrote:
>>
>> Hi John,
>>
>> Please provide us with a bit more information so we can help - which host
>> are you connecting to, what long running queries are you trying to run, etc.
>> I believe you may be running into some timeout/mechanisms that we have set
>> up to prevent abuse, see - https://phabricator.wikimedia.org/T76956#948591.
>>
>> As always, feel free to reach out to us on #wikimedia-cloud for any
>> questions.
>>
>> Thanks,
>>
>> On Fri, Sep 22, 2017 at 1:29 PM, John <phoenixoverride at gmail.com> wrote:
>>>
>>> Do we have a way of letting SQL queries run for a day or so? Im using the
>>> interactive sql <host> and running them that way but the DB host is going
>>> away in the middle of the process.
>>>
>>> _______________________________________________
>>> Labs-l mailing list
>>> Labs-l at lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>
>>
>>
>>
>> --
>> --Madhu :)
>>
>> _______________________________________________
>> Labs-l mailing list
>> Labs-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>
>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>



-- 
Jaime Crespo
<http://wikimedia.org>



More information about the Labs-l mailing list