Re: [Wikitech-l] New system ideas

2 Jan 2004

Tim Thorpe wrote:
...
  What about sticking the entire cluster on private
IP's and having a load
 balancing firewall appliance handle traffic flow which would randomly hit
 the update box? 
There would be several points of failure with such a system.

To make a system really reliable, all single points of failure need to 
be removed.

Eg: Building specific hazards- power loss (UPS sometimes fail), network 
cables cut, fire, burglary, landlord reposession, hosting company 
bankruptcy, malicious attack, human error, plane crash etc.

Machine specific hazards- any single machine failing in the system 
bringing everything down- either hardware failure or malicious attack.

Any single segment of the network failing bringing the system down- 
hardware failure, human error or malicious attack.

I believe a design philosophy where the system is immune from any single 
element failing is both the most cost-effective and the most reliable. 
Rather than invest heavily for reliability in mission-critical systems, 
make no system mission critical. No system then needs to have 
mission-critical investment. The overall system will then be cheaper and 
more reliable.

To put it another way:
All systems will fail. The probability of a single reliable costly unit 
failing is still fairly high. The probability of many fairly reliable 
cheap units with no common point of failure breaking down 
simaultaneously is much lower than the probability of a costly reliable 
unit failing.

If no single machine is critical and machines are widely separated, we 
would not even need to worry whether the machines are equipped with UPS 
or redundant supplies.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] New system ideas