Toolserver-l February 2012

toolserver-l@lists.wikimedia.org

29 participants
19 discussions

Unplanned maintenance at daphne /s2+s5-user

by Marlen Caemmerer

Hello, s2+s5-user are currently not available. Reason is it ran out of space much faster than I thought. Since more storage is available I will add it and then rerun the databases. This may take some time but I hope to have it back online this evening. Cheers nosy

12 years, 2 months

Extraordinary hardware-maintenance 16 and 17 February

by DaB.

Hello all, Nosy and I will visit the datacenter on short-term-base tomorrow and the day after, to install the remaining hardware and for some maintenance [1]; this will cause downtimes and rebootings between Thursday: 15-24 o'clock UTC Friday: 09-22 o'clock UTC. For normal the reboots will be announced on the shell 5 minutes before they happen, but accidents may happen; so please do not open files and leave your computer. If you plan to do something that takes some time (like a long download), please to it at night-times. If possible, use SGE [2] (you should always do that). We will try to be at IRC (at least from time to time), but we can not guarantee that. Sincerely, DaB. [1] https://wiki.toolserver.org/view/Admin:Datacenter-Tasks [2] https://wiki.toolserver.org/view/SGE -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885

12 years, 2 months

Where did nightshade go?

by Maarten Dammers

I got some complaints about bots not running. I haven't received any cron mail since 17-2-2012, can't login to nightshade.toolserver.org and the server doesn't even respond to pings. http://http://status.toolserver.org/ is down so could somebody please tell me what happened to nightshade? Maarten

12 years, 2 months

by Merlijn van Deen

On 15 February 2012 20:29, DaB. <WP(a)daniel.baur4.info> wrote: > Nosy and I will visit the datacenter on short-term-base tomorrow and the > day > after, to install the remaining hardware and for some maintenance [1]; this > will cause downtimes and rebootings between > > Thursday: 15-24 o'clock UTC > Friday: 09-22 o'clock UTC. > Note to anyone noticing the huge warning you get when ssh'ing to login.toolserver.org and wondering why: this is expected; login.toolserver.org has (temporarily?) changed to willow [1], and will therefore use willow's fingerprint - not nightshade's, as before. Please confirm the fingerprint at https://fingerprints.toolserver.org/ before connecting. Best, Merlijn [1] https://jira.toolserver.org/browse/MNT-1197

12 years, 2 months

What happened to pagecounts stats?

by Adam Klimont

Dear all, Pagecounts files from http://dumps.wikimedia.org/other/pagecounts-raw/ used to be downloaded by a script automatically into /mnt/user-store/stats until the end of last year. What happened to that script? The latest pagecounts I can find there are from 20111231. I could write a similar script but I do not think I have permissions to write in /mnt/user-store/stats. Also, I found some pagecounts stored in /mnt/user-store/johang - I think it would be better to store them all in one place so that everyone can use them (unless johang has a good reason to store them in his folder). Best wishes alkamid

12 years, 2 months

Data of s1 (enwiki) probably corrupted

by DaB.

Hello all, because of a wmf-master-change last week (when Nosy and I were in the datacenter) and a mistake on our side, the data of s1 which were inserted after 18. January are proberly defect or wrong. I already requested a new dump and will inform you, when there is any progress or news. Sincerly, DaB. -- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885

12 years, 2 months

Job stuck in error state

by Dr. Trigon

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello all I got some strange info by mail tonight, first: > Unable to run job: error: no suitable queues. Exiting. > then later > Job 1601224 (subster_ar) Set in error state Exit Status = -1 > Signal = unknown signal User = drtrigon Queue > = short(a)ortelius.toolserver.org Host = > ortelius.toolserver.org Start Time = <unknown> End Time > = <unknown> CPU = NA Max vmem = NA failed > assumedly before job because: can't get password entry for user > "drtrigon". Either the user does not exist or NIS error! Use "qmod > -c <jobid>" to clear job error state once the problem is fixed. > (see the attachements) I think they were reactions to 2 of my cronjobs (the other 2 run as usual). Now the strange thing is I have a job in my SGE queue which I am not able to delete by 'qdel' anymore: > job-ID prior name user state submit/start at > queue slots ja-task-ID > ----------------------------------------------------------------------------------------------------------------- > > 1601224 0.00250 subster_ar drtrigon dt 02/12/2012 12:35:55 all.q(a)ortelius.toolserver.org 1 what to do now? Or what am I doing wrong? I just want to delete this job since it crashed obviousely (but because of strange reasons) and then start it again. The cron(ie)tab entries are: > 30 0 * * * cronsub -s subster_frr $HOME/pywikipedia/bot_control.py > -subster -cron -lang:frr 0 1 * * * cronsub -s subster_en > $HOME/pywikipedia/bot_control.py -subster -cron -lang:en 30 1 * * * > cronsub -s subster_ar $HOME/pywikipedia/bot_control.py -subster > -cron -lang:ar (the job at 1:00 had no problems...) So essentially there are 2 questions: 1.) How to remove this job from queue (in order to restart it)? 2.) Why did this happen? As you can see the job was started on 'ortelius'... is this usual behaviour or was another server down? May be someone can give me any hint? Thanks in advance and greetings! DrTrigon -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk83tIEACgkQAXWvBxzBrDDn3wCgoraFGYAYwC91fuzrV/cjdMYo zSYAoKlKYKu9WEVPrL4YbC8a+c0tic+O =saBm -----END PGP SIGNATURE-----

12 years, 2 months

Wikimania scholarship deadline: Feb 16

by aude

Reminder that the deadline to apply for a Wikimania 2012 travel scholarship is February 16 (23:59 UTC). We encourage you to apply! We need developers and tech folks at Wikimania, including for the hackathon, the technical track, workshops and more. :) http://wikimania2012.wikimedia.org/wiki/Scholarships Full travel scholarships, funded by the Wikimedia Foundation and chapters (France, UK, Israel, Austria), will cover transportation, hostel accommodations, and conference registration. Partial scholarships are also offered by WMF (and Wikimedia Hungary). Wikimania 2012 will take place July 12-15 at the George Washington University in Washington, DC. Wikimania hackathon is July 10-11. The call for participation is also open now (deadline: March 18), and registration is open. Cheers, Katie

12 years, 2 months

SLOW_OK

by Lars Aronsson

Trying to follow the instructions here, https://wiki.toolserver.org/view/Database_access#Slow_queries_and_the_query… I do the following ina shell script, run from cronie: echo "select /* SLOW_OK LIMIT:1800 */ 'pl.wikipedia', count(*), ..." | mysql ... I run such queries for a large number of languages, and most of them run on time, and give good results. The string constant selected as the first column is to facilitate fault searching. Yet, occasionally I receive an email from Query-Killer that says: ----- Hello la2, a MySQL-query of yours was killed because you didn't mark it as SLOW_OK and it have run for 645 seconds which was longer than allowed. You can find the query below. Please have also a look at [1] to find information how you can avoid killings of your queries. Maybe you can optimze the query too? The replication lag at kill-time was 39s. Sincerly, Query-Killer. This eMail was sent automaticaly, please don't reply. select 'pl.wikipedia', count(*), ... ---- As can be seen, the query in the e-mail has a double space after select but no comment. Somebody cut out the comment and my SLOW_OK was not respected. Is it the "mysql" command that does this? Is there a better way? -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

12 years, 2 months

Toolserver quota

by Lars Aronsson

In December I wrote a cron job on the German toolserver, to collect statistics on external links. It works fine, but to be useful I must collect data over time, so I made a cron job to run each Monday morning. While my attention was elsewhere, believing that this was running, it turns out the 256 Mbyte quota (!) made all my files 0 bytes in length for all of January. I have now requested and gotten an increased quota, but 6 weeks of data have been lost. And I must devote time to check my quota every week or two. The /home disk is 600 GB of which 88 GB is free. That's not per user, but for all users together. It should come as a surprise to most people who donate money to the Wikimedia Foundation, that all of its volunteer developers have to share a disk the size of what is found in any laptop. According to an IRC discussion, some new disks that were planned to arrive in mid January have not yet been delivered. I have no idea what amount of disk has been ordered, or whether the quota system will be kept. I get the impression that this doesn't really matter to anybody. This is the development system for the world's 6th most visited website in 2012. It quite doesn't live up to my expectations. It feels more like some hobby project in 2002. I'm a great fan of hobby projects, but with the current budget of WMDE and WMF, I thought we would have reached a higher ambition level by now. -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

12 years, 2 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Toolserver-l February 2012