On 12/5/06, Ivan Krstić <krstic(a)solarsail.hcs.harvard.edu> wrote:
George Herbert wrote:
For the most part, web applications have a highly
effectively
paralleliseable app and web layer, but the database on many of them
doesn't
scale horizontally as well.
Wikipedia can partition the database across languages (as it does
already with the largest ones), and when individual languages grow to be
too large for a single server to deal with, there are other partitioning
schemes to look at. So it's a bit simpler here, as it's not one
monolithic data store that's growing without bound.
Or for a counterexample, Friendster. I know the
poor guy who was doing
site
architecture there for a while, screaming at his
bosses that they needed
to
get off MySQL and get a Sun/Oracle box in, and
doing unholy things to
MySQL
to try and keep it going, until he just walked
away.
And yet an even larger social networking site continues to happily churn
along with MySQL. Clearly there are examples each way, but in the case
of WMF, there are also principles that factor into the equation.
It's not one monolithic data store, but in the current model, the
en.wikipedia database is a useful test case. It's a large chunk (I don't
know, guessing a half? a third?) of the total data in play.
If we start partitioning the per-wiki database, then quite a large number of
potential technologies come into play. The $640,000 question is whether the
developer effort to partition the database effectively and efficiently will
be more expensive than a single large central server.
I have no problem with people whose principles are to use open source. I am
all for open source. I also know, from experience, that there are limits to
the scalability of many workloads beyond which large SMP systems are better
database server choices.
If the WMF workload is one of those types of workload, then the principle to
prefer open source should not be a suicide pact.
If the workload isn't that type, is trivially partitionable, or is
partitionable more affordably than the cost of SMP servers, then it should
be managed that way anyways.
The devil is in the details.
Separate question:
Has anyone developed a MediaWiki test suite, a standard set of web
operations which can be run as a load generator for benchmarking purposes?
--
-george william herbert
george.herbert(a)gmail.com