Toolserver-announce September 2012

toolserver-announce@lists.wikimedia.org

2 participants
10 discussions

Switching of SGE-arch-default at 9. October

by DaB.

Hello all, like announced in July [1] the default arch of SGE will switch soon from solaris to *. Originally that should happen at 1. October, but I forgot the re-announce it and I'm sure some of you forgot about it too. So the switch is hereby announced for 9. October, 20:00 UTC. Jobs that are executed after this timestamp and has no arch-option will run on any host (linux or solaris) instead of a solaris-host. There are 4 ways for you to prepare (sorted): -Make sure that you program runs on linux AND solaris, add "arch=*". -Make sure that you program runs only on linux, add "arch=lx". -Make sure that you program runs only on solaris, add "arch=sol". -Do nothing, pray and see things break. (Somehow I have the feeling most of you will choose 4…, please make me wrong). Sincerely, DaB. [1] http://lists.wikimedia.org/pipermail/toolserver-announce/2012- July/000506.html -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885

11 years, 6 months

Maintenance-window for webservers Monday 20:00-22:00 UTC

by DaB.

Hello all, like announced on last Sunday I hereby announce a maintenance-window for Monday, 20:00-22:00 UTC for the web-servers. I will reboot hemlock a few times to try to find out why the web-servers are not working if hemlock is away (and if I find it, I will fix it). All web-tools will failing in times when hemlock is (re-)booting, other sub-systems (like SGE) should working normal. Sincerely, DaB. -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885

11 years, 6 months

Maintenance of s2 and s5 Monday, 1st October

by Marlen Caemmerer

Hello, as tomorrow is maintenance window anyway I will add more disk space to s2 and s5. In the time of the work the databases s2 and s5 will not be available. This will take about 1-1.5 hours and I will do it when DaB checks the hemlock & web server interaction at 20 - 22 UTC. Cheers nosy

11 years, 6 months

Reboot of the linux-boxes at Monday 19:05 UTC

by DaB.

Hello all, because of a kernel-upgrade I have to reboot our linux-boxes (nightshade, yarrow and mayapple). This will happen tomorrow, Monday, 19:05 UTC. I will reboot the boxes one after the other, each reboot should not take more than 10 minutes. If you use SGE (like you should) your task will either migrate to another box or restarted automatically. If you have files open (like in a editor), you should close them. You can follow the process at [1]. Sincerely, DaB. [1] https://jira.toolserver.org/browse/MNT-1268 -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885

11 years, 6 months

z-dat-s3-a (sql-s3) was restarted

by DaB.

Hello all, to have something non-meta: I restarted mysql on z-dat-s3-a to de-swap hyacinth. sql-s3 was away for 1.5h because the shutdown was very slow. Sincerely, DaB. -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885

11 years, 7 months

Re: [Toolserver-announce] [Toolserver-l] Web servers unresponsive

by DaB.

Hello, At Sunday 23 September 2012 21:01:27 DaB. wrote: > Hello, > > At Sunday 23 September 2012 20:30:29 DaB. wrote: > > Since about an hour the web servers appear to be unresponsive: > > > > * http://ortelius.toolserver.org/~cvn/index.html > > * http://wolfsbane.toolserver.org/~cvn/index.html > > * https://toolserver.org/~cvn/index.html > > > > All error out on with no response and a time out. > > > > I can still SSH into wolfsbane and ortelius from willow, though. > > I will now investigate this. Until now the only problem I found is that > hemlock is down. I restored the web-access now. As far as I see hemlock lost its external array and became out of memory around 2:30 UTC. I have no idea why this influence our webserver. I rebooted hemlock to free the memory and restarted the webserver on ortelius and wolfsbane; the webpages are back AFAIS. What is not working at the moment is the user-store and our backup, because both are on the external array of hemlock. Also not working is munin, which is handled by hemlock. I will try to fix all this, but I guess I need nosy for that (and in the worst case Mark in the colo). > > Sincerely, > DaB. Sincerely, DaB. -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885

11 years, 7 months

Fwd: Re: [Toolserver-l] z-dat-s4-a (s4-user) is down (was: Re: Reboot of hyacinth s3/s6/s7)

by DaB.

---------- Weitergeleitete Nachricht ---------- Betreff: Re: [Toolserver-l] z-dat-s4-a (s4-user) is down (was: Re: Reboot of hyacinth s3/s6/s7) Datum: Mittwoch 19 September 2012 Von: Marlen Caemmerer <marlen.caemmerer(a)wikimedia.de> An: Wikimedia Toolserver <toolserver-l(a)lists.wikimedia.org> Hello, I had a bad accident with resizing the volume for s4-user. Unfortunatelly I did not realize s4-rr does not hold the s4-user-databases already. I installed the backup of the user databases in this instance so s4-user should be usable again. Cheers nosy _______________________________________________ Toolserver-l mailing list (Toolserver-l(a)lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette ------------------------------------------------------------- -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885

11 years, 7 months

z-dat-s4-a (s4-user) is down (was: Re: [Toolserver-l] Reboot of hyacinth s3/s6/s7)

by DaB.

Hello all, Nosy rebooted hyacinth this morning (see below). AFAIS something went wrong with the sql-partition of s4, but I have no details yet. I have to speak with Nosy first; until than sql-s4-user is down. sql-s4-rr is operating normal. Sincerely, DaB. At Tuesday 18 September 2012 16:16:51 DaB. wrote: > Hello, > > I will reboot the database server hyacinth which holds s3, s6 and s7, > tomorrow at 6:30 UTC. > > Cheers > nosy > > > _______________________________________________ > Toolserver-l mailing list (Toolserver-l(a)lists.wikimedia.org) > https://lists.wikimedia.org/mailman/listinfo/toolserver-l > Posting guidelines for this list: > https://wiki.toolserver.org/view/Mailing_list_etiquette -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885

11 years, 7 months

When to execute cron-tasks

by DaB.

Hello all, a few users contacted me about their not running cron-tasks. A often found problem is, that the cron-lines of these user are like the following: 0 0 * * * DoSomething or 0 * * * * DoSomething In a ideal world that would be no problem, but in real world that CAN be a problem. Why? Because many users have the same idea and our submit-hosts fail than with (CRON) CAN'T FORK (child_process): Not enough space. Last night 41 tasks were successful started at midnight, an unknown number failed. Of course we could just hit the problem with buying new hardware, but most time of the day these hosts do idle. So how to solve this problem? It's easy: Spread the load. Most times a task (like a bot) do not care if it is started a few minutes earlier or later. So choose a minute that is unlike 0 and not divisible without remainder by 5. If it really does not matter for you when your task starts, then take the position of the first letter of your user-name and add 2 ("dab" → "d" → 4 → 6). To not produce a misunderstanding: If your task REALLY needs to start at minute 0 (or at midnight): do it. An of course cron-tasks are failing for other reasons to, so contact me (jira-bug preferred) if you have a problem. Sincerely, DaB. -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885

11 years, 7 months

sql.toolserver.org will become deprecated

by DaB.

Hello all, as most of you know, sql.toolserver.org (short: sql) is the place where you should store your user-database if you need no joining with wmf-databases. sql points to adenia at the moment. Adenia is quite busy from time to time and sooner or later it will become overloaded. My plan is to not just buy a bigger box than, but to buy another box and split sql (so some databases will be on the old box and some will be on the new box). The problem is that this is not possible with our current setup, where we have only sql.toolserver.org – we can not configure it as round-robin because in 50% of time you would miss your database. The solution is simple, but it needs a little help from your side. I created new DNS-names in the form of "sql-user-X" where X is a letter ("sql-user-a" for example). The idea is now that you use not longer "sql", but "sql-user-X" where "X" is the first letter of your user-name (so the user "erik" use "sql- user-e" and the user "snowolf" use "sql-user-s"). If you all do this, it will be simple for the roots to move user-databases away from adenia to another server (for example we could move databases from "u_m*" till "u_z*" to the new box and nothing would break). I know that many of you need some time to update your tools. That's the reason I announce that now where you have plenty of time to update your stuff. At the moment sql-user-X points to adenia so nothing will break if you update now. Please send questions to the mailing-list. I will update the wiki-pages soon. Sincerely, DaB. -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885

11 years, 7 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

Toolserver-announce September 2012