[Labs-l] db replica query weirdness

Golden Ring goldenring.wp at gmail.com
Mon Aug 3 07:34:24 UTC 2015


Silly me, assuming that the subtraction operator would do the right thing.
Er, any idea what the subtraction operator _does_ do?

On 3 August 2015 at 05:33, Huji Lee <huji.huji at gmail.com> wrote:

> You are seeing those gaps because the method you use to calculate account
> age is not correct. Run this query:
>
> select user_id, user_registration,
> (now() - str_to_date(user_registration, '%Y%m%d%H%i%s'))/86400 as
> registered_time
> from user
> where user_id >= 25650026
> and user_id <= 25650027;
>
> And you will see that those two accounts which are created back to back,
> one at the end of 6/30/2015 and one at the beginning of 7/1/2015 will
> return an account age of ~2000 and ~1181 respectively (two numbers that are
> not that close).
>
> Instead, run this:
>
> select user_id, user_registration,
> datediff(now(), str_to_date(user_registration, '%Y%m%d%H%i%s')) as acct_age
> from user
> where user_id >= 25650026
> and user_id <= 25650027;
>
> And you will get the account age in days (33 and 34 days in this example).
>
>
> On Sun, Aug 2, 2015 at 1:12 PM, Golden Ring <goldenring.wp at gmail.com>
> wrote:
>
>> I'm hoping that someone here can explain some query weirdness to me.
>>
>> I'm trying to get a list of the n most-recently-registered users who have
>> done page moves, and how many page moves they've done.  To do this, I'm
>> using this query:
>>
>> select
>>        user_id,
>>        user_name,
>>        (now() - str_to_date(user_registration, '%Y%m%d%H%i%s'))/86400 as
>> registered_time,
>>        count(log_id) as moves
>> from
>>        user,
>>        logging_userindex
>> where
>>        user_id = log_user
>>   and (log_action = 'move' or log_action = 'move_redir')
>> group by user_id
>> order by user_id desc
>> limit {};
>>
>> For a limit of about 100 rows, this gives plausible-looking results.  But
>> increasing the limit to 1000 gives some pretty bizarre results.  There are
>> big gaps in the registered_time variable.  The range of the registered_time
>> variable is about 0 - 4,000 days, but there are no results with registered
>> times between approx 350 - 1160 days, 1500 - 2300 days, or 2670 to 3480
>> days.
>>
>> July 2013 is about 700 days ago, but a query to find users with page
>> moves who registered on July 15, 2013 shows that such users do exist:
>>
>> select count(*) from user, logging_userindex where user_id = log_user and
>> log_action='move' and user_registration like '20130715%';
>>
>> Result: 63 move actions from users registered on 20130715.
>>
>> What's going on here?  I assume I've simply got the first query wrong in
>> some way, but I can't see how.
>>
>> Note that all of the above is against the enwiki replica.
>>
>> Thanks for any help,
>> GoldenRing
>>
>>
>> _______________________________________________
>> Labs-l mailing list
>> Labs-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>
>>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20150803/dcd4d91e/attachment.html>


More information about the Labs-l mailing list