[Toolserver-l] Some comments

Tim Starling tstarling at wikimedia.org
Wed Mar 29 17:06:56 UTC 2006


I found out about this list a few days ago, and I've read back through
some of the archives. I have a few comments.

Why are the list archives private? Why isn't it listed on the
mail.wikipedia.org index page? Why isn't it in gmane? Why have I never
heard of it before? Gregory Maxwell has been whinging that nobody is
listening to him on a list that nobody can read.

Kate can be a bit secretive at times, and this was at least at one time
her pet project, but maybe now that she seems to have abandoned it, then
it's time to change the structure.

Neither the e.V. nor Kate made any particular attempt to involve the
other Wikimedia system administrators in this project from its
conception. I was certainly sceptical about zedler's value as a tool
server compared to the use we could have made of it as part of the core
cluster. I've now heard about one project that I'm interested in, and I
have an open mind about the rest, but you still have to make the case.
Specifically: how does your project benefit Wikipedia? Why should I
support it?

Daniel Kinzler wrote:

> Yesterday, Kate told me that the problem with replication from the Asian
> cluster is that mysql can only connect to one replication master. I have
> googeled a bit, and it appears that that is not true (at least for MySQL
> 5.1): http://dev.mysql.com/doc/refman/5.1/en/replication-intro.html says:
>
> Multiple-master replication is possible, but raises issues not present
> in single-master replication. See Section 6.15, “Auto-Increment in
> Multiple-Master Replication”.

Multiple-master replication in this context could more aptly be called
circular replication. This is where you have say 3 servers, A
replicating B, B replicating C, C replicating A. Then you can write to
any of the three servers, and the writes will be propagated to the other
2 servers. This is quite useless for the toolserver, where we have 5
masters which will never replicate from each other in a circle.

It should be possible to set up 5 MySQL instances and have each of them
replicating from a different master. Is anyone volunteering to set up
those instances? Maybe we need to give root access to someone who
actually cares about this stuff.

It would be easier if we had a VLAN, so that we didn't have to set up 5
ssh tunnels. Does anyone know anything about VLANs? Does anyone care
enough about this project to research it?

Regarding Daniel's WikiProxy: I have reviewed the code, and I have the
following comments:

* use curl, not file_get_contents()
* With curl you can set a short timeout, with file_get_contents() it
will be 3 minutes. Set a timeout of a few seconds, and then use
exponential backoff. Requests get lost sometimes, retries help.
* Tell curl to proxy the request via rr.pmtpa.wikimedia.org:80. This
will skip the knams squid cluster and save a few milliseconds

For applications using it: if it's too slow, use a few parallel threads.
Anything up to about 5 requests per second should be OK.

Who here needs more than 5 requests per second? Who needs a latency of
less than a few hundred milliseconds? What exactly do you want full text
replication for?

-- Tim Starling




More information about the Toolserver-l mailing list