[Foundation-l] Wikimedia Projects Growth Animated

Wed Mar 26 19:59:05 UTC 2008

*Unless*, of course, Amazon was willing to donate them :)

On Wed, Mar 26, 2008 at 1:53 PM, Brian <Brian.Mingus at colorado.edu> wrote:

> I do lots of cluster computing, and offered my cluster to Erik, but his
> job can't be run that way. I also have jobs that are very difficult to run
> on a cluster, such as computing en.WP's pagerank (this can be done with
> the Parallel Boost Graph Library, but are you guys really going to install
> MPI and PBS/Maui?). I image there are lots more.
>
> So before borrowing a 16GB machine from noaa.gov, I tried using Amazon's
> elastic compute cloud for this pagerank task. While it may seem cheap, it's
> really not that cheap. I think Erik's statistics would only take a few
> months to add up to the cost of a new machine. Unpacking that 17GB 7-zip
> file alone is going to cost you dearly in S3 storage (and instance time -
> how long does it take?). You'll have to have a Large Instance ($.40 per
> instance hour) to run the stats job which is four times more expensive than
> the small instance.
>
>
> On Wed, Mar 26, 2008 at 12:57 PM, Domas Mituzas <midom.lists at gmail.com>
> wrote:
>
> > Hi!
> >
> > this may sound as a heresy, but for some jobs, that are short in time-
> > span, but need lots of CPU capacity we could try using Amazon's EC2
> > or any other grid computing service (maybe some university wants to
> > donate cluster time?).
> > That would be much cheaper than allocating high-performance-high-
> > bucks hardware to projects like this.
> >
> > Really, we have a capable cluster that has extra-CPU capacity for
> > distributed tasks, but anything what needs lots-of-memory in single
> > location simply doesn't scale.
> > Most of our tasks are scaled out, where lots of smaller machines can
> > do lots of big work, so this wikistats job is the only one which
> > cannot be distributed this way.
> >
> > Eventually we may run Hadoop,Gearman or similar framework for
> > statistics job distribution, but really, first of all the actual
> > tasks have to be minimized to smaller segments, for map/reduce
> > operation, if needed.
> > I don't see many problems (except setting the whole grid up)
> > allocating job execution resources during off peak, on 10, 20 or 100
> > nodes, as long as it doesn't have exceptional resource needs on a
> > single node.  It would be very nice practice for many other future
> > jobs too.
> >
> > BR,
> > --
> > Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]
> >
> >
> >
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l at lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
>
>