Hello, all!
Starting today we are kicking off the process to shut down Grid Engine and
we want to share the timeline with you.
== Background ==
WMCS made the Grid Engine available as a backend engine for hosting tools
on Toolforge - our Platform as a Service(PaaS) offering.
An additional backend engine, Kubernetes, was also made available on
Toolforge.
Over time, maintaining and securing the grid has proven to be difficult and
making it harder to provide support to the community in other ways because
a lot of man-hours of maintenance work is spent on this.
This is mainly due to the fact that there has been no new Grid Engine
releases (bug fixes, security patches, or otherwise) since 2016.[0]
Maintenance work on the grid continued because it was widely popular with
the community and the Kubernetes offering didn't yet have many grid-like
features that contributors came to love.
Once the Kubernetes platform could handle many of the workloads, we started
the grid deprecation process by asking maintainers to migrate off the
grid.[1]
Over the past year, we've been reaching out to our tool maintainers and
working with them to migrate their tools off the Grid to Kubernetes.
We have reached out directly to all maintainers with their phabricator
ticket IDs.
The latest updates to Build Service[2] have addressed many of the issues
that prevented tool maintainers from migrating.
== Initial Timeline ==
The detailed grid shutdown timeline is available on wiki.[3] The important
dates have been copied below.
* 14th December, 2023: Any maintainer who has not responded on phabricator
will have tools shutdown and crontabs commented out. Please plan to migrate
or tell us your plans on phabricator before that date.
* 14th February, 2024: The grid is completely shut down. All tools are
stopped.
If you need further clarification or help migrating your tool, don't
hesitate to reach out to us on IRC, Telegram, Phabricator[4] or via any of
our support channels.[5]
Thank you.
[0]: https://techblog.wikimedia.org/2022/03/14/toolforge-and-grid-engine/
[1]:
https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation
[2]: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service
[3]:
https://wikitech.wikimedia.org/wiki/News/Toolforge_Grid_Engine_deprecation#…
[4]: https://phabricator.wikimedia.org/project/profile/6135/
[5]:
https://wikitech.wikimedia.org/wiki/Portal:Toolforge/About_Toolforge#Commun…
--
Seyram Komla Sapaty
Developer Advocate
Wikimedia Cloud Services
Hello!
The 2022 Cloud Services results have been published!
We had 159 participants who responded and provided valuable feedback and
suggestions.
For the first time, we moved from Google Forms to using LimeSurvey.
Some of you have long requested for this change and we will continue to use
LimeSurvey going forward.
The publication of the results have delayed but it's finally here:
https://meta.wikimedia.org/wiki/Research:Cloud_Services_Annual_Survey/2022
Thanks to everyone who participated and provided input and comments!
We will launch the 2023 Cloud Services survey next month!
Thank you!
--
Seyram Komla Sapaty
Developer Advocate
Wikimedia Cloud Services
Hi,
Toolforge's Harbor instance will briefly be down for a version upgrade from
2.5 to 2.9 this Wednesday at 8:00 UTC.
https://phabricator.wikimedia.org/T346241
<https://phabricator.wikimedia.org/T346241>
This should not affect any tools that are not using the new build service,
nor any tools that are already running.
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service
If you are using the builds service, you will not be able to run any new
builds, or start a job or a webservice from an image built with the
build service while Harbor is down.
We will send an update before starting maintenance work, and once
everything is back up and running.
--
Slavina Stefanova (she/her)
Software Engineer | Developer Experience
Wikimedia Foundation
Hi!
There will be a small network interruption next Monday at around 13:00 UTC
as we will be doing some cleanup on the Openstack after network
re-architecture (see https://phabricator.wikimedia.org/T348140).
It will affect all the CloudVPS and other services hosted there (including
toolforge, PAWS, quarry and superset). VMs traffic to the internet will be
cut for a short period, hopefully for a few seconds, while internal traffic
will not be affected, but if you have any open ssh session to your VMs or
login.toolforge.org, it might timeout or get dropped, and any web access to
projects will not work during the downtime.
We will update on this email thread when we start and when we have finished.
This will help stabilize the network and avoid bigger outages in the
future, so thanks for your patience!
--
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3
"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."
Hello,
The Toolforge admin team is happy to announce that the Toolforge Build
Service[0] is now available in open beta.
The Build Service is intended to allow more tools to migrate off the Grid
Engine and to make the process for deploying code to
Toolforge easier and more flexible, by building container images with the
specific dependencies for each tool.
Here are quick highlights of some of the current key features:
1. Build your tool from source code, using you language's dependency
management tool, no dockerfiles, no scripts, no manual steps
2. Use industry-wide standards[1] no vendor lock-in by using upstream
buildpacks
3. Support for many languages out of the box[2]
4. Envvars - Create and manage environment variables and secrets that are
available at runtime.[3]
5. Ability to install packages from the Ubuntu repositories[4]
6. Improved resiliency and resource usage by allowing NFS-less
webservices[5], if you don't need NFS
7. Test your image locally, or anywhere[6]
Please review the current known limitations here[7]
We also have a growing list of tutorials for various languages[8]
During this open beta, we invite you to actively participate and share your
feedback replying to this thread or through irc, and if you
find any issues or have any feature suggestions, you can use this task
template[9].
Your insights will help us enhance and tailor the Build Service to meet the
needs of your tools.
The plan is to have this phase run for the next months, and if no big
issues are found, promote it to global availability phase 1 (GA1)
while we work on adding automatic triggering and deployment, for which we
will do a second round of beta testing for those specific features.
This unblocks the last step to migrate out of the grid, so we request all
grid users to give it a try and report any issues they might find,
there's no big changes expected for the currently implemented features, so
any work done now will help later.
Thank you for being a part of this journey. We look forward to your
invaluable feedback and collaboration as we strive to provide a better
developer experience.
[0]: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service
[1]: https://buildpacks.io/
[2]:
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Supported_…
[3]:https://wikitech.wikimedia.org/wiki/Help:Toolforge/Envvars_Service
[4]:
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Installing…
[5]:
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Using_NFS_…
[6]:
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Testing_lo…
[7]:
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Known_curr…
[8]:
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service#Tutorials_…
[9]: https://w.wiki/7kpi
--
Seyram Komla Sapaty
Developer Advocate
Wikimedia Cloud Services
Hi,
We will be upgrading the Toolforge Kubernetes cluster cluster on October
18th starting at around 11:00 UTC.
The expected impact is that tools running on the Kubernetes cluster will
get restarted a couple of times over the course of the few hours it takes
for us to upgrade the entire cluster. This includes tools that use the jobs
framework and tools that run web services using the default Kubernetes
backend. The ability to manage tools will remain operational.
Taavi
--
Taavi Väänänen (he/him)
Site Reliability Engineer, Cloud Services
Wikimedia Foundation
Hi,
there is a network maintenance work happening at the moment.
For the next few minutes, expect some brief network connectivity problems.
See also: https://phabricator.wikimedia.org/T347469
regards.
--
Arturo Borrero Gonzalez
Senior SRE / Wikimedia Cloud Services
Wikimedia Foundation
> First, based on GitHub requirements.txt the library versions used in the
app are older than the latest ones, and T169452 also hints at growing
technical debt in terms of updating code. However, are there known blockers
for updating them? Ie. Does Quarry use something abandoned and blocks
updating others, or is there something else that would require a rewrite?
Or is it that updating is expected to work, but it just takes someone's
time to do it?
That's very much so the case. The primary lacking detail is time to get
things updated. Should anyone want to do so, they are welcome to. There are
no known abandoned or otherwise overtly problematic things in quarry. And
usually the effort in upgrading quarry is in fussing with calls that
changed syntax or the like between versions. There are some weird things in
quarry, namely how the db is structured. I find it unintuitive that there
isn't a unique ID/query, rather they're called through several different
tables by different ids, this doesn't stop quarry from working, just one of
the things that in my mind needs fixed. In terms of debt, it's covered in
the quarry board (https://phabricator.wikimedia.org/project/view/800/),
there are, reasonable, desires that the UI be updated to be more intuitive,
or offer a query builder, that the database selector shouldn't be offering
databases that don't exist. Many features and bug fixes are requested.
> Secondly, is there something that would prevent moving it to Toolforge? I
am unsure if it would be a viable solution, but it would reduce the
maintenance burden if volunteers would maintain it, so I am asking what
would be blocking it.
Nothing really, though I think that would be more complicated than leaving
it in place. In concept it could be left largely as is, so long as someone
wants to do OS and resulting python/library upgrades, quarry could continue
on. If you really wanted to, it would need re-written to fit into the
toolforge framework, I'm not quite sure what that would entail.
> Also, it would be worth creating a Phabricator ticket for moving
maintenance to the next person, describing what it would require and how it
could be done. I do not know if there would be anyone, but for example, my
use case for Quarry is sharing SQL queries and query results with
wikieditors, and as long results aren't cached and reading the results
requires OAUTH-login Superset doesn't work.
Opening a ticket for a handoff is welcome. It's mostly as you mentioned,
it's just time to manage upgrades. Which I've been told to redirect to
other efforts. You're welcome to open one and I can update it if desired,
though I think I'll wait for a maintainer to step forward before I open
such a ticket.
Caching of results is one of the features that superset and other services
that I was investigating do not offer. It follows the idea that old data is
less valuable than fresh data. You can share queries through things like
charts, and those can be re-run as needed to refresh the data for them.
Though the results themselves don't stay around for very long in superset.
In terms of OAUTH https://phabricator.wikimedia.org/T336522 is to offer the
ability to view without login. When I have some time for that I will see
how I can get it to allow read/refresh access to anonymous users. (I say
above that I'm working on superset, when in reality I'm being redirected to
openstack)
Vivian Rook
On Wed, Oct 4, 2023 at 4:26 AM Kimmo Virtanen <kimmo.virtanen(a)gmail.com>
wrote:
> Hi Vivian,
>
> First, thank you for maintaining the Quarry.
>
> A couple of questions from a technical perspective.
>
> First, based on GitHub requirements.txt the library versions used in the
> app are older than the latest ones, and T169452 also hints at growing
> technical debt in terms of updating code. However, are there known blockers
> for updating them? Ie. Does Quarry use something abandoned and blocks
> updating others, or is there something else that would require a rewrite?
> Or is it that updating is expected to work, but it just takes someone's
> time to do it?
>
> Secondly, is there something that would prevent moving it to Toolforge? I
> am unsure if it would be a viable solution, but it would reduce the
> maintenance burden if volunteers would maintain it, so I am asking what
> would be blocking it.
>
> Also, it would be worth creating a Phabricator ticket for moving
> maintenance to the next person, describing what it would require and how it
> could be done. I do not know if there would be anyone, but for example, my
> use case for Quarry is sharing SQL queries and query results with
> wikieditors, and as long results aren't cached and reading the results
> requires OAUTH-login Superset doesn't work.
>
> Br,
> -- Kimmo Virtanen, Zache
>
>
> _______________________________________________
> Cloud mailing list -- cloud(a)lists.wikimedia.org
> List information:
> https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
>
--
*Vivian Rook (They/Them)*
Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello, I'm writing to notify any interested party that I am stepping from
maintaining Quarry to focus on superset. I'm not sure if anyone here would
be interested in stepping up as a maintainer to keep Quarry running.
If no maintainer steps forward Quarry is likely to be removed when the
Buster images are removed by SRE, I believe that happens in June.
Should anyone be interested in being a maintainer for Quarry please let me
know and I will happily add you as one.
Some discussion can be found at https://phabricator.wikimedia.org/T169452
Thank you
--
*Vivian Rook (They/Them)*
Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>