[Labs-l] Random issues that require an OPs attention to fix

Damian Zaremba damian at damianzaremba.co.uk
Sat Oct 6 16:43:25 UTC 2012


1) DNS is broken/half working/annoying/argh
phoenix:~ damian$ dig wmflabs.org NS @labs-ns0.wikimedia.org

; <<>> DiG 9.6-ESV-R4-P3 <<>> wmflabs.org NS @labs-ns0.wikimedia.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17397
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;wmflabs.org.            IN    NS

;; Query time: 150 msec
;; SERVER: 208.80.152.33#53(208.80.152.33)
;; WHEN: Sat Oct  6 17:33:03 2012
;; MSG SIZE  rcvd: 29

phoenix:~ damian$ dig wmflabs.org NS @labs-ns1.wikimedia.org

; <<>> DiG 9.6-ESV-R4-P3 <<>> wmflabs.org NS @labs-ns1.wikimedia.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46082
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;wmflabs.org.            IN    NS

;; ANSWER SECTION:
wmflabs.org.        3600    IN    NS    labs-ns1.wikimedia.org.
wmflabs.org.        3600    IN    NS    labs-ns0.wikimedia.org.

;; Query time: 175 msec
;; SERVER: 208.80.154.19#53(208.80.154.19)
;; WHEN: Sat Oct  6 17:33:09 2012
;; MSG SIZE  rcvd: 85

Also, the SOA is wrong as it still points to virt0;
phoenix:~ damian$ dig wmflabs.org SOA @labs-ns1.wikimedia.org

; <<>> DiG 9.6-ESV-R4-P3 <<>> wmflabs.org SOA @labs-ns1.wikimedia.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46569
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;wmflabs.org.            IN    SOA

;; ANSWER SECTION:
wmflabs.org.        3600    IN    SOA    virt0.wikimedia.org. 
hostmaster.wikimedia.org. 1349449000 1800 3600 86400 7200

;; Query time: 128 msec
;; SERVER: 208.80.154.19#53(208.80.154.19)
;; WHEN: Sat Oct  6 17:33:39 2012
;; MSG SIZE  rcvd: 92


2) Instance reboots tend to result in instances never coming back - 
please could someone fix bots-cb (same as sql2, first reboot took it 
down, second results in 'failed').

3) Login's randomly fail due to key auth timing out (seems to be related 
to nfs crapping out)

4) Home dirs sometimes randomly drop their mounts (seems to be related 
to nfs crapping out also, dmesg just shows rpc timeouts)

(Yes, I know it's a Saturday but as the guy in Code Rush said; Writing 
software is different from selling real estate. Selling real estate you 
sell the people the people sleep at night. When they go to sleep you 
have to stop selling real estate. Computers never sleep.)

Damian



More information about the Labs-l mailing list