[Labs-l] db replica query weirdness

Golden Ring goldenring.wp at gmail.com
Sun Aug 2 17:12:44 UTC 2015


I'm hoping that someone here can explain some query weirdness to me.

I'm trying to get a list of the n most-recently-registered users who have
done page moves, and how many page moves they've done.  To do this, I'm
using this query:

select
       user_id,
       user_name,
       (now() - str_to_date(user_registration, '%Y%m%d%H%i%s'))/86400 as
registered_time,
       count(log_id) as moves
from
       user,
       logging_userindex
where
       user_id = log_user
  and (log_action = 'move' or log_action = 'move_redir')
group by user_id
order by user_id desc
limit {};

For a limit of about 100 rows, this gives plausible-looking results.  But
increasing the limit to 1000 gives some pretty bizarre results.  There are
big gaps in the registered_time variable.  The range of the registered_time
variable is about 0 - 4,000 days, but there are no results with registered
times between approx 350 - 1160 days, 1500 - 2300 days, or 2670 to 3480
days.

July 2013 is about 700 days ago, but a query to find users with page moves
who registered on July 15, 2013 shows that such users do exist:

select count(*) from user, logging_userindex where user_id = log_user and
log_action='move' and user_registration like '20130715%';

Result: 63 move actions from users registered on 20130715.

What's going on here?  I assume I've simply got the first query wrong in
some way, but I can't see how.

Note that all of the above is against the enwiki replica.

Thanks for any help,
GoldenRing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20150802/164c7d3d/attachment.html>


More information about the Labs-l mailing list