This is similar to what I proposed to Ori. For multi-page file (pdf,djvu,tiff) we'd prerender base thumbnails and use them for downscaling in thumb.php on demand. The base thumbnails would only be so large (e.g. not 10000px width) since there isn't much use case for massive thumbnails vs just viewing the original. This would also apply to single page TIFFs, where one reference thumbnail of reasonable size would be used for downscaling on demand. The reference files could be created on upload before the file could even appear at places like Special:NewImages. Resizing the reference files would actually be reasonable to do in thumb.php. File purges could exempt the reference thumbnails themselves (or if it didn't then their generation would be pool countered like TMH does at least). The reference thumbnails should also be in Swift even if we move to CDN "only" thumbnail storage. Disk space is cheap enough for this.


On Thu, Apr 24, 2014 at 8:57 AM, Gabriel Wicke <gwicke@wikimedia.org> wrote:
On 04/24/2014 06:00 AM, Gilles Dubuc wrote:
> Instead of each image scaler server generating a thumbnail immediately when
> a new size is requested, the following would happen in the script handling
> the thumbnail generation request:

It might be helpful to consider this as a fairly generic request limiting /
load shedding problem. There are likely simpler and more robust solutions to
this using plain Varnish or Nginx, where you basically limit the number of
backend connections, and let other requests wait.

Rate limiting without client keys is very limited though. It really only
works around the root cause of us allowing clients to start very expensive
operations in real time.

A possible way to address the root cause might be to generate screen-sized
thumbnails in a standard size ('xxl') in a background process after upload,
and then scale all on-demand thumbnails from those. If the base thumb is not
yet generated, a placeholder can be displayed and no immediate scaling
happens. With the expensive operation of extracting reasonably-sized base
thumbs from large originals now happening in a background job, rate limiting
becomes easier and won't directly affect the generation of thumbnails of
existing images. Creating small thumbs from the smaller base thumb will also
be faster than starting from a larger original, and should still yield good
quality for typical thumb sizes if the 'xxl' thumb size is large enough.

The disadvantage for multi-page documents would be that we'd create a lot of
screen-sized thumbs, some of which might not actually be used. Storage space
is relatively cheap though, at least cheaper than service downtime or
degraded user experience from normal thumb scale requests being slow.

Gabriel

_______________________________________________
Ops mailing list
Ops@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/ops



--
-Aaron S