On 12/4/06, Ivan Krstić <krstic(a)solarsail.hcs.harvard.edu> wrote:
George Herbert wrote:
You do realize that Google has spent the better
part of a half billion
dollars on engineering a completely ground-up distributed system
software
architecture, working with a problem that
(unusually among largescale
enterprise data management) can theoretically be efficiently
partitioned?
I mentioned Google because they're a well-known example, but it
certainly isn't the case that one needs to invest an inordinate sum to
be able to reap benefits from scaling out instead of up. Many other
sites with nowhere the engineering talent or financial budget of Google
are doing the same thing. In fact, sites with small budgets that choose
to scale up and succeed are few and far between, to my knowledge.
That depends on how you define "scale up"; when database limits start to be
the problem, quite a large number of sites scale up to centralized large SMP
systems running Oracle or something of that ilk quite successfully. I have
done website systems and network architecture work for some very large
websites (WebEx and
Blockbuster.com among others) and sold hardware to
people building others.
For the most part, web applications have a highly effectively
paralleliseable app and web layer, but the database on many of them doesn't
scale horizontally as well. It's not unusual around here to see sites buy a
clustered pair of big Sun boxes (or more rarely, IBM or HP) and switch to
Oracle as they grow past what MySQL and Linux servers can handle, if they're
DB limited.
All of that said, I really don't numerically understand the loads on the
Wikimedia Foundation servers, or the details of the architecture well enough
now to give specific advice.
There are large websites where the actual sustained DB load is low enough
that a farm of Linux/MySQL servers is an adequate, reliable solution. And
despite having worked at a Sun / Oracle VAR I have also deployed several
thousand linux boxes in horizontal scaled website farms.
If you prefer a non-Google example of out over up, look at LiveJournal,
as the evolution of their software and hardware is
well-documented and
more transparent than the operation of most comparable sites.
Or for a counterexample, Friendster. I know the poor guy who was doing site
architecture there for a while, screaming at his bosses that they needed to
get off MySQL and get a Sun/Oracle box in, and doing unholy things to MySQL
to try and keep it going, until he just walked away. Their site performance
implosion is near-legendary...
--
-george william herbert
george.herbert(a)gmail.com