Hi Dan,

wouldn’t it be better to throttle the application/tool that generates thumbnails so that it doesn’t try to produce too many thumbnails at once?

The issue is that there is no application generating thumbnails at a given rate. Thumbnails are being generated on demand when people view a thumbnail that doesn't exist. And since Special:NewFiles exists, and is visited every few seconds by bots, that means all new uploads have their thumbnails generated almost on the spot. Thus, we can't slow down that part. We have several long-term tasks to improve this issue, but they will take months to implement. Our only option at the moment is to try and avoid having GWToolset make too many massive images appear on Common's Special:NewFiles in a short period of time.

Over 500 of the tiff images were greater than 50 megapixels and as a consequence Commons fails to render any thumbnails

Indeed, it seems like some thumbnail generation requests timed out due to the size of these images. There are limits on the image scalers in regards to how long a thumbnailing job can take and these were going over the limit. To make matters worse, the current retry mechanism means that they were being retried 5 times, and thus using 5 times the resources. I would advise against trying to upload those enormous images for now, we should try to focus on a solution for the smaller images. It would be great if the next upload attempt leaves the images that are too large aside.

I think the safest option to proceed forward is to lower the appropriate GWToolset throttles in production and then schedule a time for Fae to try the upload process again. By scheduling a specific day and time for the next attempt, we can make sure that engineers and ops have eyes on the servers to watch the load. Then if things go well, we can tweak the throttles back to higher values.

http://www.mediawiki.org/wiki/Extension:GWToolset/Technical_Design#Throttles.2C_Limits.2C_Delays,

The throttle documentation doesn't have any unit. I understand that it's "per background job run", but how often do these background jobs run?

I couldn't find configuration values for these throttles on Commons. Dan, can you confirm that Commons is using the default values?



On Mon, Apr 28, 2014 at 11:17 AM, dan entous <d_entous@yahoo.com> wrote:
GWToolset already has several throttles in place, http://www.mediawiki.org/wiki/Extension:GWToolset/Technical_Design#Throttles.2C_Limits.2C_Delays, that limit how many background uploads are picked up with each background job run, and how many total GWToolset background jobs can exist in the entire job queue. on the beta cluster the background job seemed to vary in regards to how often it ran for GWToolset varying between 7-30. that seems like enough time for additional images to get processed in-between GWToolset images.

wouldn’t it be better to throttle the application/tool that generates thumbnails so that it doesn’t try to produce too many thumbnails at once?

with kind regards,
dan



On Apr 25, 2014, at 20:41 , Gergo Tisza <gtisza@wikimedia.org> wrote:

> On Fri, Apr 25, 2014 at 11:13 AM, Fæ <faewik@gmail.com> wrote:
> With no obvious immediate fix/work-around on the table from WMF ops, I
> have proposed to re-start my uploads for this project with an
> effective throttle by using 2 threads (this is a setting on the first
> screen of the GWToolset. In practice, having tried a run of a couple
> of hundred, this means that the tool is uploading 100MB sized images
> at a rate of 2 every 5 minutes. This seems to not be causing any
> issues.
>
> The issue was not directly with the uploads; there is no thumbnail rendering happening on upload, so GWToolset adding lots of large TIFFs quickly would not cause problems in itself. The upload speed was problematic because that meant GWToolset saturated pages like Special:NewFiles, and when somebody looked at such pages, *that* triggered lots of thumbnail renderings of huge TIFF files at the same time. If GWToolset is slowed down and lots of miscellaneous files are uploaded between the TIFFs, those special pages won't be problematic, but something like a gallery or category of huge TIFF files could still be.
> _______________________________________________
> Glamtools mailing list
> Glamtools@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/glamtools