Hi
Am 27.07.2011 08:18, schrieb Kay Drangmeister:
there have been quite some performance tuning measures
on ptolemy:
(1) number of render processes has been reduced from 8/6 to 4
(2) Kolossos modified expire.rb to render low zoom tiles with low
probability
(3) indexes have been added to the DB for geometry,hstore and osm-id
(4) clustering
Is there a good way that we can monitor the results? Especially
(1) should be carefully tracked. I can see no significant changes
in IO throughput
This decision has been made to try if offloading the database
would
result in less render timeouts.
http://munin.toolserver.org/OSM/ptolemy/iostat.html or
IO
http://munin.toolserver.org/OSM/ptolemy/io_bytes_sd.html
and not even in postgres connections
http://munin.toolserver.org/OSM/ptolemy/postgres_connections_osm_mapnik.html
The load and CPU usage has been decreased a bit. My guess would
be that more processes would result in a better CPU utilization
(and thus faster overall rendering).
To monitor this we need two figures: (a) average tile rendering
time (per process) and (b) tiles rendered per second (by all
processes). Can we set up munin to track it?
I don't think tirex allows
capturing the tile throughput on a
per-process base, I guess it would need to be modified to allow that.
The whole tirex block has disappeared from the statistics.
Munin is not
listing the plugins anymore:
osm@ptolemy:~$ telnet localhost 4949
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
# munin node at
ptolemy.esi.toolserver.org
list
apache_accesses apache_processes apache_volume cpu df if_e1000g0
io_busy_sd io_bytes_sd io_ops_sd iostat load mod_tile_fresh
mod_tile_response mod_tile_zoom netstat ntp_kernel_err
ntp_kernel_pll_freq ntp_kernel_pll_off ntp_offset ntp_states
postfix_mailqueue postfix_mailstats postfix_mailvolume postgres_bgwriter
postgres_cache_osm_mapnik postgres_checkpoints postgres_connections_db
postgres_connections_osm_mapnik postgres_locks_osm_mapnik
postgres_querylength_osm_mapnik postgres_scans_osm_mapnik
postgres_size_osm_mapnik postgres_transactions_osm_mapnik
postgres_tuples_osm_mapnik postgres_users postgres_xlog processes
replication_delay2 uptime users
This seems like a munin misconfiguration. Sometimes only munin-node
needs to be restarted.
And another question: earlier, two slots have been
reserved for
prio 1 queue requests (i.e. missing tiles). Is there a reserve
available currently? Otherwise one would have to wait in that
case.
I just reduced the max. number of render processed by two. The
configuration now looks like this:
osm@ptolemy:~$ less tirex/etc/tirex/tirex.conf
# Buckets for different priorities.
bucket name=missing minprio=1 maxproc=6 maxload=20
bucket name=dirty minprio=2 maxproc=4 maxload=8
bucket name=bulk minprio=10 maxproc=3 maxload=6
bucket name=background minprio=20 maxproc=3 maxload=4
Peter