[Labs-l] Accessing the databases from labs - A comparison with the toolserver

Platonides platonides at gmail.com
Fri Jul 12 19:26:55 UTC 2013


On 12/07/13 20:24, Marc A. Pelletier wrote:
> On 07/12/2013 01:59 PM, Platonides wrote:
>> These connections are cached, so if I connected to fiwiki
>> and then to eowiki, the same db object would be returned.
>
> I don't think that any putative gain of performance or resources this
> would give is worth the added complexity;

On a simple benchmarking, reconnecting for each db is about 4 times 
slower, but that may not be representative.

php script.php $(grep 192.168.99.3 /etc/hosts | cut -c 14-)

<?php
array_shift($argv);
$m = null;
foreach ($argv as $arg) {
	if (!$m) # <-- Comment this line
		$m = mysql_connect($arg, 'username', 'password');
	if (!$m) die(1);
	mysql_select_db($arg, $m);
}


 > but if you insisted on doing
 > that caching, you should do it by the actual cluster IP, not the name
 > you used since only the former is guaranteed to be valid in all cases.
 >
 > (Well, strictly speaking, only the [host,port] tuple is, but all the
 > ports will always remain the same since we do portmapping)

Except that we could have several IPs per cluster (as TS does).


>>> In this particular case, it's not avoidable (for user databases).
>> Why?
>
> We use a double underscore as the guaranteed cannot-occur-in-a-username
> mark.  But beyond that, the usernames are different because we don't
> handle credentials the same way.  Since the allowable databases are
> derived from the username, then the database names are necessarily
> different (picking between "u_foo" and "u_p12345g12345" is no harder
> than having to pick between "u_foo" and "p12345g12345__foo" --
> hardcoding the dabase name breaks either way).

Except for remembering when you are typing the name manually, but the 
key is: why 12345 and not 'foo' ?


>> There is also a grid engine on TS.
>
> Yes, but while its use has been recently been made mandatory, I think,
> most tools do not in fact use it and need to be adapted.

It has been available (and highly encouraged) for a long time, but I 
can't comment about the amount of sge vs plain cron used on TS.


>> It's not possible to specify a
>> cluster as requisite in labs, BTW.
>
> That would be because, by design, any execution node has access to every
> cluster.  Having this be a requested resource would be akin to being
> able to request that your job be provided with a CPU to execute it.  :-)
>
> -- Marc

It's very nice to have an environment where the database servers are 
always available, jobs are not affected by maria maintenances and dbs 
are so ubiquitous that not even sge needs to know about what is used :)



More information about the Labs-l mailing list