-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
DaB.:
the hole toolserver-cluster was away between (circa)
19:12 and 20:30 UTC. The
main-problem was that nfs stoped working and so no login, no web-apps and no
command-line-command could have run. It looks that nfs stoped working because
DNS-resolving stoped working (and AFAIS DNS stoped working because of low
memory or a network-problem). The roots will investigate where the problem
exactly was.
Hi,
This seems to have been some kind of kernel memory leak on turnera (one
of the cluster nodes), although it's not clear what caused it yet.
Since the userland was only very slow (rather than completely down) the
cluster didn't notice the problem and fail over until someone rebooted
the host manually.
I will be investigating exactly which service caused the problem.
- river.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (NetBSD)
iEYEARECAAYFAk5K6DUACgkQIXd7fCuc5vL6tQCgibQtbrM8w69rXSe3IsFwpD1M
GIAAn1fuQyJyVgGzvcDF+/gsNGy3yjli
=/16O
-----END PGP SIGNATURE-----