[Labs-l] second attempt to request alternative login server

Ryan Lane rlane at wikimedia.org
Sun Mar 3 19:43:18 UTC 2013


On Sun, Mar 3, 2013 at 7:51 AM, Petr Bena <benapetr at gmail.com> wrote:

> HI,
>
> today it's second time that bastion was inaccessible:
>
> If you are having access problems, please see:
>
> https://wikitech.wikimedia.org/wiki/Access#Accessing_public_and_private_instances
> debug1: Authentications that can continue: publickey
> debug1: Next authentication method: publickey
> debug1: Offering RSA public key: /home/petanb/.ssh/id_rsa
> debug2: we sent a publickey packet, wait for reply
>
>
> if we can't have a different way to authenticate than using public
> keys WHICH ARE broken often - can we have at least second stable login
> server.
>
> BTW I assume that logins didn't work because of gluster so that it
> wouldn't work anyway, but if gluster suck so hard, can we at least
> have password auth until you fix it? Bad authentication is better than
> no working authentication
>
>
Though I'm usually more than happy to blame gluster, this was not caused by
gluster. It was because someone OOM'd the instance.

We've actually finally stablized gluster to a point where we shouldn't be
having complete outages any more:

https://ganglia.wikimedia.org/latest/?r=month&cs=&ce=&m=cpu_report&s=by+name&c=Glusterfs+cluster+pmtpa&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4

Note in the above graph that the past week and a half the memory usage has
been mostly flat. There was one spot where the memory ballooned, then a
spot where it dropped. That last memory balloon was before the changes we
put in place and the drop was where I restarted the glusterd processes
(which doesn't affect filesystem access).

There are some split brain issues still around from the most recent round
of instability, but the SSH keys are perfectly fine. I will not enable
password authentication. It's incredibly insecure.

So, to get a little more back on point, I've just created
bastion2.wmflabs.org and bastion3.wmflabs.org, in case the bastion
instances OOM again.

- Ryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wikimedia.org/pipermail/labs-l/attachments/20130303/3390ce76/attachment.html>


More information about the Labs-l mailing list