Hi Dan,
wouldn’t it be better to throttle the application/tool that generates
thumbnails so that it doesn’t try to produce too many thumbnails at
once?
The issue is that there is no application generating thumbnails at a given rate. Thumbnails are being generated on demand when people view a thumbnail that doesn't exist. And since Special:NewFiles exists, and is visited every few seconds by bots, that means all new uploads have their thumbnails generated almost on the spot. Thus, we can't slow down that part. We have several long-term tasks to improve this issue, but they will take months to implement. Our only option at the moment is to try and avoid having GWToolset make too many massive images appear on Common's Special:NewFiles in a short period of time.
Over 500 of the tiff images were greater than 50 megapixels and as a consequence Commons fails to render any thumbnails
Indeed, it seems like some thumbnail generation requests timed out due to the size of these images. There are limits on the image scalers in regards to how long a thumbnailing job can take and these were going over the limit. To make matters worse, the current retry mechanism means that they were being retried 5 times, and thus using 5 times the resources. I would advise against trying to upload those enormous images for now, we should try to focus on a solution for the smaller images. It would be great if the next upload attempt leaves the images that are too large aside.
I think the safest option to proceed forward is to lower the appropriate GWToolset throttles in production and then schedule a time for Fae to try the upload process again. By scheduling a specific day and time for the next attempt, we can make sure that engineers and ops have eyes on the servers to watch the load. Then if things go well, we can tweak the throttles back to higher values.