[Labs-l] open grid on bots

Petr Bena benapetr at gmail.com
Mon Mar 11 15:40:37 UTC 2013


Right - I understand that it's "self service" tool labs where everyone
can maintain their stuff without need for sysadmin assistance. But it
rather sounds like a dream than a reality to me.

Imagine you want to install a bot on your restricted grid: at 14:00 at
Saturday, which is likely a good time for a volunteer to do their
stuff. First I go and unpack my files, create some submit script, (if
I was newbie) ran qsub and boom - nothing happened - who is there to
help me out huh?

If I wasn't newbie and managed to get output either using qsub -o or
some wrapper - I find out that I am missing packages A, B on all
nodes. The proper way to fix it, of a good admin is to insert them to
some puppet class - THAT is correct was of good sysadmin - but not a
simple way for someone who never saw puppet before - and I believe
most of toolserver users never did. Anyway, I update puppet, submit a
patch and... if nothing will change I will wait some typical 2 weeks
for some sysadmin from wmf to merge it. (That's how long it usually
takes now, when I am lucky).

Indeed, that sounds like a stable cluster, where bots will run fine
once they run. But getting them to run for first time will be
incredible pain for people who don't understand UNIX or puppet, or sge
and all that.

That's where sysadmins who can assist you with stuff come handy.

On Mon, Mar 11, 2013 at 4:03 PM, Marc A. Pelletier <marc at uberbox.org> wrote:
> On 03/11/2013 05:14 AM, Petr Bena wrote:
>>
>> the lack of sysadmins was one of biggest problems of toolserver - just
>> creation of account took ages and there was nearly no support at all -
>> that's what I am afraid your project is heading to. It will be
>> perfectly designed cluster maintained by 1 person.
>
>
> I think you're missing the point that part of the reason why it's
> advantageous to move to WMF hosting is exactly the opposite; this means that
> ultimately, you get the weight of ops behind the infrastructure rather than
> just isolated sysadmins (volunteer or not).  Add to that that we get to
> leverage the technical resources already in place and we end up with an
> infrastructure that is much /less/ dependent on sysadmin intervention to
> run.
>
> I'm a good sysadmin, which means I am a *lazy* sysadmin.  I can guarantee
> you that one of my primary objectives is that nobody needs to wait on me to
> do anything for normal tool writing and maintenance!  :-)  What isn't
> currently automated will be configured to be self-serve -- as long as it
> does not impact reliability of other tools.  If I do my job right, all of my
> time will be spent sharing knowledge with maintainers and coping with
> hardware failures, not doing gruntwork.
>
> That said, I don't believe there is anything wrong with volunteer sysadmins,
> and my understandting is that this is indeed something which we may look
> forward to in the future (although, admitedly, not this early in the Tool
> Labs life cycle).  But it's important that you also understand that
> objectives of reliability are best served by limiting the number of people
> who can be root, and to "formalize" a bit the way things are done.  Yes,
> this /does/ have the downside that some things are going to be a bit slower
> to do; but I want to be able to tell maintainers that "if your tool works
> now, it's not going to break tomorrow" and that means being a bit more
> disciplined and, yes, a bit more restrictive in how we do things.
>
> In the meantime, part of the reason the WMF pays me is to make sure that
> there /is/ someone available to help.  Even when I'm not "on the clock",
> you'll find me easy to reach and responsive; when I'm not near IRC, I'm
> still reachable by email; and once the tools project is well on its way the
> other members of the ops team will also be able to react in case of
> emergencies.
>
>
> -- Marc
>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l



More information about the Labs-l mailing list