[Xmldatadumps-admin-l] FR, IO, DE, EN, SV Wiki Failures

Tomasz Finc tfinc at wikimedia.org
Tue May 12 00:54:23 UTC 2009


Tomasz Finc wrote:
> This was a nasty mine to step on. While updating the template file to 
> correctly update the links in the status file a variable wasn't properly 
> declared. One exception and the whole system spun out of control and died.
> 
> That's just silly. Missing to update a status file on one iteration 
> should not kill the build out right. I've added a fail safe so that if 
> this happens again, the other portions of the build that are working 
> just fine will continue to.
> 
> Thus two lessons:
> 
> 1) Depend on test environment more
> 2) Add catch blocks to the code so that small failures do not bring 
> the 	   system down.
> 
> --tomasz
> 
> _______________________________________________
> Xmldatadumps-admin-l mailing list
> Xmldatadumps-admin-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l

  |-sh(15466)---php(15467)-+-sh(15468)---bzip2(15469)

Amusingly enough the linux kernel did something really helpful here. 
When the python worker died, the kernel re-assigned the build manager to 
init and kept going.

What this means is that some of these are still going unmanaged. DE & ES 
  are both chugging along and I'll be watching them closely until they 
finish their run. Sadly we had to kill EN due to high load on the db.

I'll update the status pages to reflect which ones are still going.

So far its

EN: pending db stability
FR: dead, restarted, running
DE: running, ETA 2009-05-14 07:33:40
ES: running, ETA 2009-05-12 21:12:5
IO: finished
SV: dead, restarted, running


There are parallel DE & ES jobs that got kicked off after the failures 
that I'll kill if the left over procs continue to get work.

--tomasz



More information about the Xmldatadumps-admin-l mailing list