Mark Bergsma wrote:
Hi Marcin,
Thank you for your answers re OS/upgrades/kernel - 100% agreed.
(Partitioning): we know that it's traditional to
separate /usr /var etc,
but we have found that this usually has very little use in practice, and
is more often a nuisance. These days we put everything in one large
enough / and only split off data partitions on servers where it matters.
Of course your databases should be running off a special partition, but
for the rest there is probably no real need. If you think otherwise and
have good arguments, we can surely change it, of course. We do tend to
use LVM for everything non-root in those cases.
Please excuse my 1995-era UNIX thinking :)
The same holds for the RAID setup: on our databases
and big storage
systems, most often we just run it off the same big RAID-10 array. It's
more convenient and flexible and if well-configured the rest of the OS
is not hitting that array much at all. If you feel there is a need, we
can of course change it - but we'll need to reinstall the OS. A
different RAID level would be totally fine as well of course - this is
very much dependent on your needs. I picked RAID-10 as neither Aevar nor
Katie knew what was necessary, and RAID-10 tends to be the best choice
for databases and high performance I/O systems.
The issue is not about separating OS away from the rest, it's about
testing how we can split two different usage patterns on the databases
we might have.
The best solution would be to have an extra pair of small drives in
RAID#1 so that we can check whether 2x or 3xRAID-10 does change anything
in the picture indeed. I am somehow not confident about extNfs doing
stuff optimally.
As soon as we confirm that we do not run out of space by removing two
drives from RAID-10 I would definitely go for reinstall on a separate
RAID#1 pair (taken out of the current RAID-10 if we have nothing small
available).
Serial console/LOM access cannot easily be handed out,
but should also
not be necessary usually. In the unlikely event that the system becomes
unmanageable in-band, just contact us directly (ask on #wikimedia-tech
for example) and we'll restore it quickly.
If you handle the whole OS/hardware part - fine with me. One trouble
less. :)
(re multicast from the other email)
However, Switches/routers handle
multicast traffic specially, have group/port membership limits for them
and we've also found several bugs. So before you start using it heavily,
I'd like to know what for. :) With only 2 servers communicating, would
unicast not be a better idea?
Spread (the tool I am thinking of) requires basically either broadcast
or multicast. The choice is yours :) Should (a) this model prove as
workable and (b) we will quickly find out we need to start to grow a
farm of rendering servers (hopefully not) - you might very well decide
that WMF might need to carry mcast traffic for example across Atlantic.
For now, we are just our little family of few boxes in The Netherlands.
This is not something to even *think* about now - I would like to see
how it works with our 2 or 3 servers (yes, including Cassini *for now* -
see below), so multicast would certainly be an advantage.
I will probably get back to you re virtual IP addresses anyway once my
ideas mature and will be ready to be put into action.
(re-arranged order below)
I really want to stress that these systems need to be
*separate*, they
cannot be used together at all. Ideally there is no traffic between
those servers at all, except in the form of cassini generating visitor
traffic like the rest of the Internet. Cassini is meant for playing
around where lots of people have access, the other two are (in the end)
really meant for production use with limited access.
We are now in the middle of the internal discussion about the future
role of Cassini. It has been raised (and I share this view) that we
might not really need another toolserver box (we have now one
underutilized Sun and one Linux anyway) and remote access to the
databases and rendering infrastructure from existing toolservers might
be enough.
As I prefer to build this architecture bottom-to-the-top (i.e. ptolemy
first, rendering later, user access at the end), we still need to find
out what the exact role of Cassini will be.
Stable operation is simply not possible when arbitrary
users can do arbitrary
things on a system, and that's why we intended these systems to be very isolated
from the start.
One of my ideas (this is only mine and other project members might
certainly disagree) would be to have Cassini as the box that runs
newer/experimental versions of production stuff from ortelius/ptolemy.
This can still benefit toolserver users (so that they have the
infrastructure to test their stylesheets for example), but will be
definitely more under control unlike "playing around a lot".
It can be very useful to share some functions with ortelius *before we
go into production* just to test feasibility of a distributed rendering
engine I am envisioning. This might mean that cassini will be much more
closely coupled with prolomy/ortelius than with users and their stuff.
*I* would rather have another box coupled with the two *now* to test our
load distribution concepts then another toolserver. Daniel, feel free to
bash me for that :)
So, from WMF perspective, I would rather promote Cassini to be treated
like almost-production box for now (as ptolemy is) and under same
administration processes we have for WMF *until* the rendering
infrastructure will be ironed out to go live. After this it can be
a perfect staging box to test updates to the WMF production environment
- with a software setup that could be promoted to the production boxes
once tested.
Cassini is also managed by WMDE / Toolserver, ptolemy
and ortelius are Wikimedia Foundation managed. So I'm afraid that we
really cannot use those servers in one resource pool...
Having said above, nothing will change with Cassini without prior
written consent from Wikimedia Deutschland. That's why we try to work
together to have a final architecture ironed out.
If those separate clusters do not have enough
resources/space to do what we need,
I think we should look into buying more hardware. That is really not
impossible. :)
Before we do that, I'd like to check whether how we can max out what we
have. And I'd like to know, for example, do I need more smaller machines
or just one big? And what exactly are our storage requirements (thinking
about i18n-zed tiles for example)? I think we should be prepared for
a higher demand than OSM currently has - that's where my concerns come
from. I'd like to avoid unnecessary duplication of infrastructure where
we could have just more power. Maps in many ways different than casual
PHP/Mediawiki bot stuff run on Toolserver - we have much more power to
control the environment (like putting users' rendering requests at
the lowest priority).
To sum up:
(1) we will be working on architecture with the goal to make cassini
work as optimal as possible for the project
(2) as soon as we find out how much PostgreSQL space we need, I would
ask you to reinstall ptolemy for us
(3) at least multicast group would be fine for now
--
<< Marcin Cieslak // saper(a)saper.info >>