Re: [Wikitech-l] Re: Secondary servers

13 Jan 2004

Gabriel Wicke wrote:

...
  How about installing Squid on one of the machines?
That would take a fair
 amount of load away. Is there a machine with some free Ram available?
 Even installing Squid on larousse would do i guess. I've glanced over the
 php code- there are mainly two header lines we would need to change to
 activate this- we could start off with a 30 minute timeout for anonymous
 users. Purging should get ready soon as well. 
Perhaps I will be burned at the stake as a heretic for this, but I am 
not convinced squid proxies are the answer.

The delays in the wikiserver system are caused by waiting for I/O- the 
time taken for mechanical devices to seek a particular block of data. If 
the data is being served from a squid cache rather than from a cache on 
the wiki server, how will this reduce the overall I/O blocking problem? 
The busiest page data won't substantially add to I/O blocking on the 
wiki server as it will likely be in memory all the time. The squid proxy 
  is ideal to solve the problem of network load from commonly accessed 
pages or pages which demand a lot of CPU power to generate but this is 
not a problem on wikipedia. If Squid proxies are being implemented to 
increase performance, then they are the right solution to the wrong 
problem. If they are to increase reliability by adding redundancy - 
multiple data sources-, they do this to a degree but are far from ideal.

The most commonly used pages are going to be in the memory of the 
database server so these are not costly to serve. The costly pages to 
serve are those which need disk seeks to serve. The more I/O seek 
operations a page requires, the more costly it is to serve.

The proxy server will need to make a database lookup (for the URL) and, 
unless the page is in memory rather than on-disk storage, use I/O to 
reach the fine grained data. The data for each unique URL will be bigger 
than that held in cache on the database server as it will contain html 
formatting and other page data. The likelihood of the data being in the 
memory of a proxy server is lower than the data being in memory of a 
similarly equipped database server as the data size of the final HTML 
page will be ~7.5k bigger than that of the database data.

If performance is the criteria, I suggest a proxy isn't a good idea. 
Instead, the memory otherwise used in a proxy would be better utilised 
caching database data directly. Either as a ramdisk or perhaps a network 
attached database storage with plenty of solid statememory.

 From what I have gathered, the cost (limiting factor to performance) is 
that of delays seeking fine grained data. Either this seek load will 
need to be spread across many mechanical devices such that the work is 
not unduly duplicated, or store the fine grained data in solid state 
storage so that it can be seeked quickly.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Re: Secondary servers