The signature is very consistent. I only have to search for ^\"10\. to find them, and they all look more or less like this:

"10.####/<ID>" OR "http://<publisher_website>/.../10.####/<ID>"

If they are consistently cranking out 45K of these searches every 2 hours or so, they should be easy to find once we have a place to look.

I'm trying to make sense of it. Does it make sense as referral spam or something?

Trey Jones
Software Engineer, Discovery
Wikimedia Foundation


On Mon, Jul 27, 2015 at 6:49 PM, Tomasz Finc <tfinc@wikimedia.org> wrote:
If the signature is as specific as were seeing here then i'm sure
we'll see them again and can easily identify.

--tomasz

On Mon, Jul 27, 2015 at 3:48 PM, Erik Bernhardson
<ebernhardson@wikimedia.org> wrote:
> On Mon, Jul 27, 2015 at 3:39 PM, Tomasz Finc <tfinc@wikimedia.org> wrote:
>>
>> On Mon, Jul 27, 2015 at 2:04 PM, Trey Jones <tjones@wikimedia.org> wrote:
>> > and it's 9% of the wiki zero-results queries
>>
>> That's a huge discovery to better understand our traffic.
>>
>> What do we know about who this is? proxy, bot, app, other, etc?
>>
>> I'm eager to have a talk with them :)
>>
>
> The current firehose of logs doesn't contain any PII, so we basically have
> no idea where these come from. I've been thinking with oliver on if/what PII
> should be stored (the data is under NDA anyways, but we've always err'd on
> the side of caution).
>
>
> _______________________________________________
> Wikimedia-search mailing list
> Wikimedia-search@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
>

_______________________________________________
Wikimedia-search mailing list
Wikimedia-search@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search