Re: [Wikitech-l] Wikipedia has a problem

5 Dec 2006

On 12/4/06, Alphax (Wikipedia email) &lt;alphasigmax(a)gmail.com&gt; wrote:
...

 Rob Church wrote:
  On 04/12/06, Jay R. Ashworth
&lt;jra(a)baylink.com&gt; wrote:
  Yes, Ivan; that's why Jeff suggested
provisioning more powerful
 platforms.  :-)  That he mentioned throughput is really a red-herring
 here... 
 Let's filch a couple supercomputers.

 Who needs supercomputers when you can use distributed computing?
 Wikipedia@Home anyone? 

<brief response>
"No"
</brief response>

<computer architect hat>
The problem with distributed computing for general computational problems
requiring interactive updates between computational nodes is that the amount
of traffic between nodes increases both with the amount of updates and with
the number of nodes.

The various @home type projects, and batch job type grid computing, are
efficient network-wise; there is little to no network interaction other than
submission and results.  Where data has to flow laterally, as in a database
system, the scaling increases as described above, rapidly bringing the
distributed application to its knees.  This is a known problem in the design
of MPI type supercomputer clusters.  As the required interconnect
communications bandwidth increases, the optimial solution moves from highly
distributed, towards larger nodes in a higher speed network, towards a few
nodes in a very very high speed network, and finally towards a single large
SMP computer.  If you are pushing the interconnect very hard, you're having
to pay for extremely high performance interconnects, and ultimately there's
no reason not to just buy large SMP systems with single system image and
more compact footprints.  The highest speed system to system interconnects
cost as much as the SMP system components do.

One current commercial example of this particular problem is the throttled
Oracle RAC problem, where you spread a big database out over a large pile of
Linux nodes, but the database update rates overwhelm the private network
interconnect for the shared cache updates.  There are plenty of HPC computer
horror stories of equivalent problems out there in the field for other
workloads.

There is no single right answer; even the large labs who have large
quantities of professional computer architects and code wranglers working on
optimized highly multi-processor projects don't agree on how to do most
work, which is why there are still a bunch of competing
supercomputer/supercluster vendors out there, in many cases individual
centers or labs buying from multiple vendors as they have different types of
work in progress.

For Wikipedia's database problems, I would have to have about 10x more info
than I currently do to make particularly useful actual advice or suggestion,
but I can guarantee you that WikiDB@home would be a disaster of heroic
porportions 8-)
</computer architect hat>

-- 
-george william herbert
george.herbert(a)gmail.com

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Wikipedia has a problem