[Xmldatadumps-admin-l] FR, IO, DE, EN, SV Wiki Failures
Tomasz Finc
tfinc at wikimedia.org
Tue May 12 00:54:23 UTC 2009
Tomasz Finc wrote:
> This was a nasty mine to step on. While updating the template file to
> correctly update the links in the status file a variable wasn't properly
> declared. One exception and the whole system spun out of control and died.
>
> That's just silly. Missing to update a status file on one iteration
> should not kill the build out right. I've added a fail safe so that if
> this happens again, the other portions of the build that are working
> just fine will continue to.
>
> Thus two lessons:
>
> 1) Depend on test environment more
> 2) Add catch blocks to the code so that small failures do not bring
> the system down.
>
> --tomasz
>
> _______________________________________________
> Xmldatadumps-admin-l mailing list
> Xmldatadumps-admin-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l
|-sh(15466)---php(15467)-+-sh(15468)---bzip2(15469)
Amusingly enough the linux kernel did something really helpful here.
When the python worker died, the kernel re-assigned the build manager to
init and kept going.
What this means is that some of these are still going unmanaged. DE & ES
are both chugging along and I'll be watching them closely until they
finish their run. Sadly we had to kill EN due to high load on the db.
I'll update the status pages to reflect which ones are still going.
So far its
EN: pending db stability
FR: dead, restarted, running
DE: running, ETA 2009-05-14 07:33:40
ES: running, ETA 2009-05-12 21:12:5
IO: finished
SV: dead, restarted, running
There are parallel DE & ES jobs that got kicked off after the failures
that I'll kill if the left over procs continue to get work.
--tomasz
More information about the Xmldatadumps-admin-l
mailing list