What happened to our user agent requirements?

List overview All Threads
Download

newer

older

What does "Director of...

Discovery Department A/B testing...

Oliver Keyes

1 Sep 2015 1 Sep '15

4:24 p.m.

According to https://meta.wikimedia.org/wiki/User-Agent_policy and the associated mailing list threads, user agent headers are now required (and have been for some time) but on the request log side, we see a lot of requests with the user agent "-" - IOW, an empty field. Is the blocking of requests absent a user agent simply happening at a 'higher' stage (in mediawiki itself?) and so not registering with the varnishes, or is sending an /empty/ header simply A-OK? -- Oliver Keyes Count Logula Wikimedia Foundation

Show replies by date

Chad

1 Sep 1 Sep

4:41 p.m.

On Tue, Sep 1, 2015 at 9:24 AM Oliver Keyes <okeyes(a)wikimedia.org> wrote:

...

Is the blocking of requests absent a user agent simply happening at a 'higher' stage (in mediawiki itself?) and so not registering with the varnishes,

No, it's not done at the application level.

...

or is sending an /empty/ header simply A-OK?

Shouldn't be, unless the policy changed... -Chad

John

4:42 p.m.

Could they be sending a non-standard header of "-" On Tuesday, September 1, 2015, Chad <innocentkiller(a)gmail.com> wrote:

...

On Tue, Sep 1, 2015 at 9:24 AM Oliver Keyes <okeyes(a)wikimedia.org <javascript:;>> wrote:

Is the blocking of requests absent a user agent simply happening at a 'higher' stage (in mediawiki itself?) and so not registering with the varnishes,

No, it's not done at the application level.

or is sending an /empty/ header simply A-OK?

Shouldn't be, unless the policy changed... -Chad _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org <javascript:;> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Oliver Keyes

4:46 p.m.

On 1 September 2015 at 12:42, John <phoenixoverride(a)gmail.com> wrote:

...

Could they be sending a non-standard header of "-"

Perfectly possible although also impossible to detect :(

...

On Tuesday, September 1, 2015, Chad <innocentkiller(a)gmail.com> wrote:

On Tue, Sep 1, 2015 at 9:24 AM Oliver Keyes <okeyes(a)wikimedia.org <javascript:;>> wrote:

Is the blocking of requests absent a user agent simply happening at a 'higher' stage (in mediawiki itself?) and so not registering with the varnishes,

No, it's not done at the application level.

or is sending an /empty/ header simply A-OK?

_______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Oliver Keyes Count Logula Wikimedia Foundation

Tomasz Finc

4:58 p.m.

Let's get a task in phab for this so that we can triage next steps. I'm curious about this as well. --tomasz On Tue, Sep 1, 2015 at 9:46 AM, Oliver Keyes <okeyes(a)wikimedia.org> wrote:

...

On 1 September 2015 at 12:42, John <phoenixoverride(a)gmail.com> wrote:

Could they be sending a non-standard header of "-"

Perfectly possible although also impossible to detect :(

On Tuesday, September 1, 2015, Chad <innocentkiller(a)gmail.com> wrote:

On Tue, Sep 1, 2015 at 9:24 AM Oliver Keyes <okeyes(a)wikimedia.org <javascript:;>> wrote:

Is the blocking of requests absent a user agent simply happening at a 'higher' stage (in mediawiki itself?) and so not registering with the varnishes,

No, it's not done at the application level.

or is sending an /empty/ header simply A-OK?

_______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Oliver Keyes Count Logula Wikimedia Foundation _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Oliver Keyes

4:59 p.m.

Specifically, the hypothesis that people are sending "-"? On 1 September 2015 at 12:58, Tomasz Finc <tfinc(a)wikimedia.org> wrote:

...

Let's get a task in phab for this so that we can triage next steps. I'm curious about this as well. --tomasz On Tue, Sep 1, 2015 at 9:46 AM, Oliver Keyes <okeyes(a)wikimedia.org> wrote:

On 1 September 2015 at 12:42, John <phoenixoverride(a)gmail.com> wrote:

Could they be sending a non-standard header of "-"

Perfectly possible although also impossible to detect :(

On Tuesday, September 1, 2015, Chad <innocentkiller(a)gmail.com> wrote:

On Tue, Sep 1, 2015 at 9:24 AM Oliver Keyes <okeyes(a)wikimedia.org <javascript:;>> wrote: > Is the > blocking of requests absent a user agent simply happening at a > 'higher' stage (in mediawiki itself?) and so not registering with the > varnishes, No, it's not done at the application level. > or is sending an /empty/ header simply A-OK? > > Shouldn't be, unless the policy changed... -Chad _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org <javascript:;> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Oliver Keyes Count Logula Wikimedia Foundation

Tomasz Finc

5:01 p.m.

Tracking the overall issue On Tue, Sep 1, 2015 at 9:59 AM, Oliver Keyes <okeyes(a)wikimedia.org> wrote:

...

Specifically, the hypothesis that people are sending "-"? On 1 September 2015 at 12:58, Tomasz Finc <tfinc(a)wikimedia.org> wrote:

Let's get a task in phab for this so that we can triage next steps. I'm curious about this as well. --tomasz On Tue, Sep 1, 2015 at 9:46 AM, Oliver Keyes <okeyes(a)wikimedia.org> wrote:

On 1 September 2015 at 12:42, John <phoenixoverride(a)gmail.com> wrote:

Could they be sending a non-standard header of "-"

Perfectly possible although also impossible to detect :(

On Tuesday, September 1, 2015, Chad <innocentkiller(a)gmail.com> wrote: > On Tue, Sep 1, 2015 at 9:24 AM Oliver Keyes <okeyes(a)wikimedia.org > <javascript:;>> wrote: > > > Is the > > blocking of requests absent a user agent simply happening at a > > 'higher' stage (in mediawiki itself?) and so not registering with the > > varnishes, > > > No, it's not done at the application level. > > > > or is sending an /empty/ header simply A-OK? > > > > > Shouldn't be, unless the policy changed... > > -Chad > _______________________________________________ > Wikitech-l mailing list > Wikitech-l(a)lists.wikimedia.org <javascript:;> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Christian Aistleitner

8:57 p.m.

On Tue, Sep 01, 2015 at 12:42:35PM -0400, John wrote:

...

Could they be sending a non-standard header of "-"

They could. But if a request comes in without a User-Agent header, the logging pipeline silently translates it into "-". Have fun, Christian P.S.: The relevant configuration (for webrequests) is at https://github.com/wikimedia/operations-puppet/blob/production/modules/role… That long line contains '%{User-Agent@user_agent}i', which means log the request's User-Agent header but no default value is provided. As no default value is provided, varnishkafka uses the pre-set default value, which is "-": https://github.com/wikimedia/varnishkafka/blob/master/varnishkafka.c#L246 This conversion from the empty string to "-" does not kill relevant information and is useful for some researchers when manually inspecting TSVs, or manually browsing Hive output. -- ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/ ---------------------------------------------------------------

Oliver Keyes

4:45 p.m.

On 1 September 2015 at 12:41, Chad <innocentkiller(a)gmail.com> wrote:

...

On Tue, Sep 1, 2015 at 9:24 AM Oliver Keyes <okeyes(a)wikimedia.org> wrote:

Is the blocking of requests absent a user agent simply happening at a 'higher' stage (in mediawiki itself?) and so not registering with the varnishes,

No, it's not done at the application level.

or is sending an /empty/ header simply A-OK?

Shouldn't be, unless the policy changed...

Well I'm looking at millions of requests from API-users-who-I-am-not-big-fans-of[0] with a blank UA sooo.. [0] actual term far more expletive-laden than this

...

-Chad _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Oliver Keyes Count Logula Wikimedia Foundation

Krinkle

5:18 p.m.

I've confirmed just now that whatever requirement there was, it doesn't seem to be in effect. Both omitting the header entirely, sending it with empty string, and sending with "-"; – all three result in a response from the MediaWiki API. $ curl -A '' --include -v 'https://en.wikipedia.org/w/api.php?action=query&format=json' <https://en.wikipedia.org/w/api.php?action=query&format=json'>

...

GET /w/api.php?action=query&format=json HTTP/1.1 Host: en.wikipedia.org Accept: */*

< HTTP/1.1 200 OK .. {"batchcomplete":""} $ curl -A '-' --include -v 'https://en.wikipedia.org/w/api.php?action=query&format=json' <https://en.wikipedia.org/w/api.php?action=query&format=json'>

...

GET /w/api.php?action=query&format=json HTTP/1.1 User-Agent: - Host: en.wikipedia.org <http://en.wikipedia.org/> Accept: */*

Oliver Keyes

5:23 p.m.

Awesome; thanks for the analysis, Krinkle. Do we want to change this behaviour? From my point of view the answer is 'yes, not setting any kind of user agent is a violation of our API etiquette and we should be taking steps to alert people that it is' but if other people have different perspectives on this I'd love to hear them. On 1 September 2015 at 13:18, Krinkle <krinklemail(a)gmail.com> wrote:

...

GET /w/api.php?action=query&format=json HTTP/1.1 Host: en.wikipedia.org Accept: */*

< HTTP/1.1 200 OK .. {"batchcomplete":""} $ curl -A '-' --include -v 'https://en.wikipedia.org/w/api.php?action=query&format=json' <https://en.wikipedia.org/w/api.php?action=query&format=json'>

GET /w/api.php?action=query&format=json HTTP/1.1 User-Agent: - Host: en.wikipedia.org <http://en.wikipedia.org/> Accept: */*

< HTTP/1.1 200 OK .. {"batchcomplete":""} In the past (2012?) these were definitely being blocked. (Ran into it from time to time on Toolserver) It seems php file_get_contents('http://...api..' <http://...api..'>) is also working fine now, without having to init_set a user_agent value first. -- Krinkle _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Oliver Keyes Count Logula Wikimedia Foundation

Brion Vibber

5:37 p.m.

I'm not 100% convinced that the UA requirement is helpful, for two reasons: 1) Lots of requests will have default like "PHP" or "Python/urllib" or whatever from the tool they used to build their bot. These aren't helpful either as they contain no of how to get in touch. 2) It's trivial to work around the requirement for a non-blank UA by setting one of the above, or worse -- cut-n-pasting the UA string from a browser. If someone hacks this up real quick while testing, they may never bother putting in contact information when their bot moves from a handful of requests to gazillions. Auto-throttling super-high-rate API clients (by IP/IP group) and giving them an explicit "You really should contact us and, better yet, make it possible for us to contact you" message might be nice. We may want to seriously think about some sort of API key system... not necessarily as mandatory for access (we love freedom and convenience!) but perhaps as the way you get around being throttled for too many accesses. This would give us a structured way of storing their contact information, which might be better than unstructured names or addresses in the UA. Does it make sense to tell people "log in to your bot's account with OAuth" or is that too much of a pain in the ass versus "add this one parameter to your requests with your key"? :) -- brion On Tue, Sep 1, 2015 at 10:23 AM, Oliver Keyes <okeyes(a)wikimedia.org> wrote:

...

I've confirmed just now that whatever requirement there was, it doesn't

seem to be in effect.

Both omitting the header entirely, sending it with empty string, and

sending

with "-"; – all three result in a response from the MediaWiki API. $ curl -A '' --include -v '

https://en.wikipedia.org/w/api.php?action=query&format=json' < https://en.wikipedia.org/w/api.php?action=query&format=json'>

GET /w/api.php?action=query&format=json HTTP/1.1 Host: en.wikipedia.org Accept: */*

< HTTP/1.1 200 OK .. {"batchcomplete":""} $ curl -A '-' --include -v '

https://en.wikipedia.org/w/api.php?action=query&format=json' < https://en.wikipedia.org/w/api.php?action=query&format=json'>

GET /w/api.php?action=query&format=json HTTP/1.1 User-Agent: - Host: en.wikipedia.org <http://en.wikipedia.org/> Accept: */*

< HTTP/1.1 200 OK .. {"batchcomplete":""} In the past (2012?) these were definitely being blocked. (Ran into it

from time to time on Toolserver)

It seems php file_get_contents('http://...api..' <http://...api..'>) is

also working fine now,

without having to init_set a user_agent value first. -- Krinkle _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Oliver Keyes

5:44 p.m.

If people aren't capable of following UA guidelines I doubt they're going to follow voluntary login. For what it's worth I absolutely support both rate-limiting and login to get around this. In fact, I would argue that from an analytics point of view rate limiting is probably the most high-profile problem we have with incoming data at the moment. It's far, far too common for random pieces of automata to set themselves up and massively skew our datasets; identifying this in advance is impossible (we don't always have IP data) and identifying them post-hoc on an individual basis is massively time consuming. Why don't we have rate limiting + login? Who would work on this? Why /should/ we not have rate limiting? On 1 September 2015 at 13:37, Brion Vibber <bvibber(a)wikimedia.org> wrote:

...

I've confirmed just now that whatever requirement there was, it doesn't

seem to be in effect.

Both omitting the header entirely, sending it with empty string, and

sending

with "-"; – all three result in a response from the MediaWiki API. $ curl -A '' --include -v '

https://en.wikipedia.org/w/api.php?action=query&format=json' < https://en.wikipedia.org/w/api.php?action=query&format=json'>

GET /w/api.php?action=query&format=json HTTP/1.1 Host: en.wikipedia.org Accept: */*

< HTTP/1.1 200 OK .. {"batchcomplete":""} $ curl -A '-' --include -v '

https://en.wikipedia.org/w/api.php?action=query&format=json' < https://en.wikipedia.org/w/api.php?action=query&format=json'>

GET /w/api.php?action=query&format=json HTTP/1.1 User-Agent: - Host: en.wikipedia.org <http://en.wikipedia.org/> Accept: */*

< HTTP/1.1 200 OK .. {"batchcomplete":""} In the past (2012?) these were definitely being blocked. (Ran into it

from time to time on Toolserver)

It seems php file_get_contents('http://...api..' <http://...api..'>) is

also working fine now,

_______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Oliver Keyes Count Logula Wikimedia Foundation

Trey Jones

6:05 p.m.

I agree with rate-limiting those without some sort of ID (login or API key). As Oliver said, big (ab)users can massively skew our stats, often by themselves. But hordes of upper middle volume bots (way too high for a human, nowhere near the max for a superstar bot) can have a large cumulative effect, too. We can't track them down individually, or even detect that they are there because they are "only" involved in a fraction of a percent of traffic—but a hundred such bots add up to a significant skew, and reasonable rate limits could knock them down to manageable levels. While enforcing UA requirements is inherently reasonable, anyone who doesn't know to set up a valid UA string may not know to not just copy one from a browser to make things worse. (I've done that myself in the past when using curl with an uncooperative site. The shame.) Maybe rate limiting will be the 80 in the 80/20 solution, and enforcing UA reqs won't be necessary to control traffic, leaving them as a silly but effective way of identifying certain kinds of traffic. The flip-side case would be bajillions of very low volume bots—mimicking roughly human levels of traffic and so sailing under rate limits—all with blank UAs. But we could note that after rate limiting slows down the ridiculously heavy hitters and take action as needed. Trey Jones Software Engineer, Discovery Wikimedia Foundation On Tue, Sep 1, 2015 at 1:44 PM, Oliver Keyes <okeyes(a)wikimedia.org> wrote:

...

I'm not 100% convinced that the UA requirement is helpful, for two

reasons:

1) Lots of requests will have default like "PHP" or "Python/urllib" or whatever from the tool they used to build their bot. These aren't helpful either as they contain no of how to get in touch. 2) It's trivial to work around the requirement for a non-blank UA by setting one of the above, or worse -- cut-n-pasting the UA string from a browser. If someone hacks this up real quick while testing, they may

never

bother putting in contact information when their bot moves from a handful of requests to gazillions. Auto-throttling super-high-rate API clients (by IP/IP group) and giving them an explicit "You really should contact us and, better yet, make it possible for us to contact you" message might be nice. We may want to seriously think about some sort of API key system... not necessarily as mandatory for access (we love freedom and convenience!)

but

perhaps as the way you get around being throttled for too many accesses. This would give us a structured way of storing their contact information, which might be better than unstructured names or addresses in the UA. Does it make sense to tell people "log in to your bot's account with

OAuth"

or is that too much of a pain in the ass versus "add this one parameter

your requests with your key"? :) -- brion On Tue, Sep 1, 2015 at 10:23 AM, Oliver Keyes <okeyes(a)wikimedia.org>

wrote:

> Awesome; thanks for the analysis, Krinkle. > > Do we want to change this behaviour? From my point of view the answer > is 'yes, not setting any kind of user agent is a violation of our API > etiquette and we should be taking steps to alert people that it is' > but if other people have different perspectives on this I'd love to > hear them. > > On 1 September 2015 at 13:18, Krinkle <krinklemail(a)gmail.com> wrote: > > I've confirmed just now that whatever requirement there was, it

doesn't

> seem to be in effect. > > > > Both omitting the header entirely, sending it with empty string, and > sending > > with "-"; – all three result in a response from the MediaWiki API. > > > > $ curl -A '' --include -v ' > https://en.wikipedia.org/w/api.php?action=query&format=json' < > https://en.wikipedia.org/w/api.php?action=query&format=json'> > >> GET /w/api.php?action=query&format=json HTTP/1.1 > >> Host: en.wikipedia.org > >> Accept: */* > > < HTTP/1.1 200 OK > > .. > > {"batchcomplete":""} > > > > > > $ curl -A '-' --include -v ' > https://en.wikipedia.org/w/api.php?action=query&format=json' < > https://en.wikipedia.org/w/api.php?action=query&format=json'> > >> GET /w/api.php?action=query&format=json HTTP/1.1 > >> User-Agent: - > >> Host: en.wikipedia.org <http://en.wikipedia.org/> > >> Accept: */* > > < HTTP/1.1 200 OK > > .. > > {"batchcomplete":""} > > > > In the past (2012?) these were definitely being blocked. (Ran into it > from time to time on Toolserver) > > It seems php file_get_contents('http://...api..' <http://...api..'>)

also working fine now,

_______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Legoktm

2 Sep 2 Sep

3:49 a.m.

On 09/01/2015 10:37 AM, Brion Vibber wrote:

...

I'm not 100% convinced that the UA requirement is helpful, for two reasons:

For those of us who looked for the initial rationale on the UA requirement, the announcement and resulting discussion is at [1]. [1] http://www.gossamer-threads.com/lists/wiki/wikitech/189275 -- Legoktm

Brad Jorsch (Anomie)

1 Sep 1 Sep

6:38 p.m.

On Tue, Sep 1, 2015 at 1:18 PM, Krinkle <krinklemail(a)gmail.com> wrote:

...

In the past (2012?) these were definitely being blocked. (Ran into it from time to time on Toolserver) It seems php file_get_contents('http://...api..' <http://...api..'>) is also working fine now, without having to init_set a user_agent value first.

I wonder if it got lost in the move from Squid to Varnish, or something along those lines. -- Brad Jorsch (Anomie) Senior Software Engineer Wikimedia Foundation

Platonides

10:42 p.m.

Brad Jorsch (Anomie) wrote:

...

I wonder if it got lost in the move from Squid to Varnish, or something along those lines.

That's likely, given that it was enforced by squid.

Brandon Black

11:54 p.m.

On Tue, Sep 1, 2015 at 10:42 PM, Platonides <platonides(a)gmail.com> wrote:

...

Brad Jorsch (Anomie) wrote:

I wonder if it got lost in the move from Squid to Varnish, or something along those lines.

That's likely, given that it was enforced by squid.

We could easily add it back in Varnish, too, but I tend to agree with Brion's points that it's not ultimately helpful. I really do like the idea of moving towards smarter ratelimiting of APIs by default, though (and have brought this up in several contexts recently, but I'm not really aware of whatever past work we've done in that direction). From that relatively-ignorant perspective, I tend to envision an architecture where the front edge ratelimits API requests (or even possibly, all requests, but we'd probably have to exclude a lot of common spiders...) via a simple token-bucket-filter if they're anonymous, but lets them run free if they superficially appear to have a legitimate cookie or API access token. Then it's up to the app layer to enforce limits for the seemingly-identifiable traffic and be configurable to raise them for legitimate remote clients we've had contact with, and to reject legitimate-looking tokens/logins that the edge choses not to ratelimit which aren't actually legitimate. -- Brandon

Gabriel Wicke

2 Sep 2 Sep

12:22 a.m.

We recently revisited rate limiting in https://phabricator.wikimedia.org/T107934, but came to similar conclusions as reached in this thread: - Limits for weak identifiers like IPs or user agents would (at least initially) need to be high enough to render the limiting borderline useless against DDOS attacks. - Stronger authentication requirements have significant costs to users, and will require non-trivial backend work to keep things efficient on our end. I believe we should tackle this backend work in any case, but it will take some time. - In our benchmarks, most off-the-shelf rate limiting libraries use per-request network requests to a central service like Redis, which costs latency and throughput, and has some scaling challenges. There are algorithms [1] that trade some precision for performance, but we aren't aware of any open source implementations we could use. The dual of rate limiting is making each API request cheaper. We have recently made some progress towards limiting the cost of individual API requests, and are working towards making most API end points cacheable & backed by storage. Gabriel [1]: http://yahooeng.tumblr.com/post/111288877956/cloud-bouncer-distributed-rate… On Tue, Sep 1, 2015 at 4:54 PM, Brandon Black <bblack(a)wikimedia.org> wrote:

...

On Tue, Sep 1, 2015 at 10:42 PM, Platonides <platonides(a)gmail.com> wrote:

Brad Jorsch (Anomie) wrote:

I wonder if it got lost in the move from Squid to Varnish, or something along those lines.

That's likely, given that it was enforced by squid.

-- Gabriel Wicke Principal Engineer, Wikimedia Foundation

Gergo Tisza

12:54 a.m.

On Tue, Sep 1, 2015 at 4:54 PM, Brandon Black <bblack(a)wikimedia.org> wrote:

...

I really do like the idea of moving towards smarter ratelimiting of APIs by default, though (and have brought this up in several contexts recently, but I'm not really aware of whatever past work we've done in that direction). From that relatively-ignorant perspective, I tend to envision an architecture where the front edge ratelimits API requests (or even possibly, all requests, but we'd probably have to exclude a lot of common spiders...) via a simple token-bucket-filter if they're anonymous, but lets them run free if they superficially appear to have a legitimate cookie or API access token. Then it's up to the app layer to enforce limits for the seemingly-identifiable traffic and be configurable to raise them for legitimate remote clients we've had contact with, and to reject legitimate-looking tokens/logins that the edge choses not to ratelimit which aren't actually legitimate.

Rate limiting / UA policy enforcement has to be done in Varnish, since API responses can be cached there and so the requests don't necessarily reach higher layers (and we wouldn't want to vary on user agent).

Gabriel Wicke

1:21 a.m.

On Tue, Sep 1, 2015 at 5:54 PM, Gergo Tisza <gtisza(a)wikimedia.org> wrote:

...

The cost / benefit trade-offs for Varnish cache hits are fairly different from those of cache misses. Especially for in-memory (frontend) hits it might overall be cheaper to send a regular response, rather than adding rate limit overheads to each cache hit.

Brandon Black

1:23 a.m.

On Wed, Sep 2, 2015 at 1:21 AM, Gabriel Wicke <gwicke(a)wikimedia.org> wrote:

...

On Tue, Sep 1, 2015 at 5:54 PM, Gergo Tisza <gtisza(a)wikimedia.org> wrote:

Yeah I was mostly thinking of uncacheable API accesses. If we can cache it, we don't mind (as much) in terms of load/abuse. By having the simpler outer check in varnish, though, it takes the big load from anonymous spikes away from being handled at the applayer for those uncacheable hits.

3183

days inactive

3184

days old

wikitech-l@lists.wikimedia.org

Manage subscription

21 comments

14 participants

tags (0)

participants (14)

Brad Jorsch (Anomie)
Brandon Black
Brion Vibber
Chad
Christian Aistleitner
Gabriel Wicke
Gergo Tisza
John
Krinkle
Legoktm
Oliver Keyes
Platonides
Tomasz Finc
Trey Jones