How we will use the Celeron systems in Europe hasn't been decided yet - there's
still regular discussion going on in the usual technical discussion place, #mediawiki.
Anyone not participating there routinely is missing a high proportion of the productive
technical discussions.
The options:
1. Split by language and project. We've done this before. It's easy and well
understood. It's limited when we end up having many different proxy locations because
it's hard to load balance and because people in one country don't only work on one
language. It's very likely that we'll do this at first, even if we change later.
2. There are several ways to identify country given an IP address. At least three
anti-spam DNS servers do this and there are public lists of IP ranges by country. A very
well understood solution but not one we have experience of (unless you cont me working on
an open source spam blocking project which supports it). The Freenode IRC network which
hosts the IRC channels uses this approach and one of their technical people likes us, was
involved in developing their solution and offered asistance a few weeks ago. The
information isn't perfectly accurate but it's fairly good - good enough.
3. There is at least one solution which uses internet topology analysis to determine the
optimal server to send traffic to based on the IP address. It's easy enough - the
routers on the internet already track this. The big advantage of this is that it will work
well for any number of remote sites. When we have 50 remote sites, people will be directed
to the closest one to them (and then load balancing can offload it if necessary). See
http://www.supersparrow.org/ for one solution of this type.
We may well start with 1 and move to 3 later. 2 is probably worth skipping if we can -
it's not as good 3 and doesn't scale to a large number of places as well. Initial
tests will almost certainly use 1 for proving the concept of offloading traffic from the
US based Squids and sort out any problems.
But none of this is decided yet. We won't know for a month or more, probably.