At 00:07 13/01/2004 +0000, you wrote:
Gabriel Wicke wrote:
How about installing Squid on one of the machines?
That would take a fair
amount of load away. Is there a machine with some free Ram available?
Even installing Squid on larousse would do i guess. I've glanced over the
php code- there are mainly two header lines we would need to change to
activate this- we could start off with a 30 minute timeout for anonymous
users. Purging should get ready soon as well.
Perhaps I will be burned at the stake as a heretic for this, but I am not
convinced squid proxies are the answer.
You should not be burnt
The delays in the wikiserver system are caused by
waiting for I/O- the
time taken for mechanical devices to seek a particular block of data. If
the data is being served from a squid cache rather than from a cache on
the wiki server, how will this reduce the overall I/O blocking problem?
Agreed
The busiest page data won't substantially add to
I/O blocking on the wiki
server as it will likely be in memory all the time. The squid proxy is
ideal to solve the problem of network load from commonly accessed pages or
pages which demand a lot of CPU power to generate but this is not a
problem on wikipedia. If Squid proxies are being implemented to increase
performance, then they are the right solution to the wrong problem. If
they are to increase reliability by adding redundancy - multiple data
sources-, they do this to a degree but are far from ideal.
The most commonly used pages are going to be in the memory of the database
server so these are not costly to serve. The costly pages to serve are
those which need disk seeks to serve. The more I/O seek operations a page
requires, the more costly it is to serve.
The proxy server will need to make a database lookup (for the URL) and,
unless the page is in memory rather than on-disk storage, use I/O to reach
the fine grained data. The data for each unique URL will be bigger than
that held in cache on the database server as it will contain html
formatting and other page data. The likelihood of the data being in the
memory of a proxy server is lower than the data being in memory of a
similarly equipped database server as the data size of the final HTML page
will be ~7.5k bigger than that of the database data.
One solution is to lower the options available to the user to make the
pages more static (unpopular)
If performance is the criteria, I suggest a proxy
isn't a good idea.
Instead, the memory otherwise used in a proxy would be better utilised
caching database data directly. Either as a ramdisk or perhaps a network
attached database storage with plenty of solid statememory.
Probably the only solution apart from splitting the various wikis to more
servers
From what I have gathered, the cost (limiting factor
to performance) is
that of delays seeking fine grained data. Either this seek load will need
to be spread across many mechanical devices such that the work is not
unduly duplicated, or store the fine grained data in solid state storage
so that it can be seeked quickly.
Or data replication over more/many db servers.
We really do need to spread the load (high power processors are less
important than high speed disk.
As things are at the moment to cache the db to get full performance we need
a server with 50gig ram (we cannot get 4gig working yet).
So for the next best thing I believe the suggested hot swap disk array is
needed for both performance and reliability (2 off) to make two fast db
servers.
plus squids apaches and dns round robin etc
Dave Caroline aka archivist