Cloud February 2020

cloud@lists.wikimedia.org

12 participants
15 discussions

by Christophe Dzikowski

Hello! I'm new in the Wikimedia list, I'm interested in developing projects using RDF, if some of you have the same goal, I'll be happy to exchange! Best regards Christophe

4 years, 3 months

[Mediawiki-api-announce] BREAKING CHANGE: Stricter validation of integer-type parameters

by Brad Jorsch (Anomie)

Various unusual values for integer-type parameters to the Action API will no longer be accepted. Acceptable values will consist of an optional sign (ASCII `+` or `-`) followed by 1 or more ASCII digits. Values that were formerly allowed, and will now result in a "badinteger" error, include: - Values with extraneous whitespace, such as " 1". - "1.9", formerly interpreted as "1". - "1e1", formerly interpreted as either "1" or "10" at various times. - "1foobar", formerly interpreted as "1" - "foobar", formerly interpreted as "0". Most clients should already be using correct formats, unless they are taking direct user input without validation. This change will most likely go out to Wikimedia wikis with 1.35.0-wmf.19. See https://www.mediawiki.org/wiki/MediaWiki_1.35/Roadmap for a schedule. -- Brad Jorsch (Anomie) Senior Software Engineer Wikimedia Foundation _______________________________________________ Mediawiki-api-announce mailing list Mediawiki-api-announce(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce

4 years, 3 months

[Mediawiki-api-announce] BREAKING CHANGE: Parameter validation error codes

by Brad Jorsch (Anomie)

The error codes that may be changing are some of those representing invalid values for API parameters. Notably, the following will change: - "noX", indicating that a required parameter was not specified, becomes "missingparam". - "unknown_X", indicating that an unrecognized value was specified for an enumerated-value parameter, becomes "badvalue". - "too-many-X", indicating that too many values were supplied to a multi-valued parameter, becomes "toomanyvalues". - "baduser_X", "badtimestamp_X", and so on become "baduser", "badtimestamp", and so on. Note this is not a comprehensive list, other codes may be changing as well. These changes make the error codes more predictable, at the expense of not indicating in the code which parameter specifically triggered the error. If you have a use case where knowing which parameter triggered the error is needed, please let us know (by replying to this message or by filing a request in Phabricator) and we'll add the necessary error metadata. The human-readable text is also changing for some of these errors (with or without error code changes), and for a few the error metadata may be changing (e.g. "botMax" changes to "highmax" for limit-type warnings in non-back-compat error formats). This change will most likely go out to Wikimedia wikis with 1.35.0-wmf.19. See https://www.mediawiki.org/wiki/MediaWiki_1.35/Roadmap for a schedule. -- Brad Jorsch (Anomie) Senior Software Engineer Wikimedia Foundation _______________________________________________ Mediawiki-api-announce mailing list Mediawiki-api-announce(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce

4 years, 3 months

[Cloud-announce] (very) Brief WMCS + wikitech outage Monday 2020-02-10, 15:00 UTC

by Andrew Bogott

On Monday we'll be restarting the database server that supports most WMCS services. During the restart various things will fail: Wikitech pages will fail to load, OpenStack API calls will fail, etc. In all cases if you encounter an issue you can just count to 20 and try again, by which time things will most likely be back up. Active VMs, tools and other things hosted on toolforge or cloud-VPS should be unaffected. The restart will happen at 15:00 UTC on Monday -- that's 7AM Pacific Time. _______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce

4 years, 3 months

Webservice failures (Toolforge)

by Russell Blau

TL;DR: The lighttpd webservice for https://tools.wmflabs.org/dplbot/ fails repeatedly, frequently, and unpredictably, and I have been unable to diagnose any cause. Currently, tools.dplbot is running a php7.2 webservice on the kubernetes backend; however, the failures started occurring when it was running lighttpd on the job grid, and the move to kubernetes does not seem to have changed anything in this respect. The tool serves a variety of PHP-based pages which generate reports from the Toolforge database replicas. The symptom of failure is that all requests get rejected with 503 service unavailable. The lighttpd process continues to run (which is why I am calling this a "failure" rather than a "crash"), which means kubernetes doesn't detect any problem and doesn't restart the server, but the server does not respond to any requests. The "webservice status" command claims that the webservice is still running. Every time this happens, I have to restart the webservice. The webservice appears to fail immediately after some restarts, while in other cases it runs normally for a period of time, which is highly variable (minutes to hours) and then fails again. Even more frustrating than the constant failures is the lack of any information to allow diagnosing the cause of this. The error.log file (/data/project/dplbot/error.log) does not show any error messages corresponding to the times of failures. I tried various lighttpd debugging options, and none of these gave me anything useful. They appear to show all requests being handled normally, and no debug information at all at or or after the point of failure. I also reactivated access logging (/data/project/dplbot/access.log), and this only shows requests that were handled correctly. In other words, there is no log indicating a request that came in at/just before a failure without a corresponding response going out. If these failures were being caused spontaneously by some problem in lighttpd or in the Toolforge infrastructure, I would expect other users to be affected by them, but that doesn't seem to be the case. This has previously been reported at https://phabricator.wikimedia.org/T115231 (including more detail on the debug options I tried), where frankly I have received absolutely no assistance. I did receive one mildly helpful comment from bd808 on a related issue (https://phabricator.wikimedia.org/T218915), as follows: > ... [It is] possible to have a Kubernetes powered webservice become unresponsive to client requests due to an internal deadlock or resource exhaustion issue in the application which does not also lead to a crash of the lighttpd process itself. However, if there is an internal deadlock or resource exhaustion issue in the underlying PHP scripts, I would expect some error message in the logs, which isn't there. Also, during a recent interval when the server was up for a while, I took the time to click every single link on https://tools.wmflabs.org/dplbot/, and the server responded to every one of them, so there does not seem to be a fatal bug in any of the scripts (although this exercise revealed a few minor issues). I'm not necessarily looking for someone to solve this problem for me (although that would be nice :-) ), but just some ideas about how to identify potential causes. Right now it is basically a black hole; no information whatsoever is coming out of the webserver at the point of failure, so I can make no progress. -- Russell Blau russblau(a)imapmail.org

4 years, 3 months

2024

2023

2022

2021

2020

2019

2018

2017

Cloud February 2020