On 4/9/06, Chris Wilson <chris(a)qwirx.com> wrote:
Hi all,
I guess you probably know that Wikipedia was down earlier, due to a
power fault at the colo centre (apparently). I was chatting to brion on
IRC and he recommended that I contact this list.
I think it should be possible to make Wikipedia fully redundant to
outages of individual data centres, and not too expensive. Here's how.
Get a BGP portable IP address range.
Yea, one of dem portable /25s.
Advertise this range from TWO
locations, at separate data centres. Have basically identical read-only
servers on each range, with the same IP addresses. Don't worry about IP
conflicts, as the servers are identical, and the shortest route from any
given client will point to just one data centre, and not move unless
that data centre goes down, when it will automatically fall back to the
other.
Er narf. No.
Internet routing is not that stable.
If you anycast TCP on the public internet you *will* end up with
oddball behavior as routing topology changes where users get hung
connections because the route changed out from under them. Getting
such a thing working correctly is quite a big more complex then you
seem to think it is.
Under normal conditions, your load is shared between
both data centres,
so you don't need to actually increase the number of servers. If one
goes down, all requests go to the other, so performance might drop, but
Wikipedia should stay up.
This only works for read-only servers, so the process of editing
Wikipedia would still rely on one of the groups (or some subset of
servers in that group) being masters, and all the other servers being
slaves that sync off those masters.
Last I checked we still had issues getting mysql replication working
well across non-local networks.
It's just a suggestion, I'd be interested to
hear what you think.
If you are interested, I know a hosting company that has a BGP-portable
range (I used to work for them), and I could talk to them about whether
they can set up redundant IP tunnelling for that range to whatever IP
addresses (VPN endpoints) you want, so you wouldn't even need to have
your own BGP range.
For what you propose the portable block would have to contain all the
normal Wikipedia traffic. I some how suspect that they would rather
not be tunneling several hundred mbit/sec of traffic. :)