[Labs-l] Out of memory errors

Tim Landscheidt tim at tim-landscheidt.de
Mon Aug 3 16:27:47 UTC 2015


(anonymous) wrote:

> At least two of my web application have been hitting out of memory errors
> in the past 24 hours. I'm pretty sure that neither is requesting more
> memory than it did before. The problem is not appearing for similar
> cronjobs run through jsub -l release=trusty.

> Is there something that can be done about this server-side?

There is a recurring problem with
tools-webgrid-lighttpd-1409 (= an instance that runs stan-
dard web services on Trusty).

The basic cause is that the available virtual memory is
overstated by these hosts as the jobs running there will
share substantial amounts of memory by using the same bina-
ries (lighttpd, php-cgi, etc.).  If one of those web ser-
vices does something different, then the formula doesn't
work anymore and the host runs short on real memory.

Or so I thought, because:

| scfc at tools-bastion-01:~$ qconf -se tools-webgrid-lighttpd-1409.eqiad.wmflabs
| hostname              tools-webgrid-lighttpd-1409.eqiad.wmflabs
| load_scaling          NONE
| complex_values        slots=128,release=trusty
| load_values           arch=lx26-amd64,num_proc=4,mem_total=7985.183594M, \
|                       swap_total=487.996094M,virtual_total=8473.179688M, \
|                       load_avg=1.160000,load_short=1.110000, \
|                       load_medium=1.160000,load_long=0.930000, \
|                       mem_free=6150.722656M,swap_free=487.996094M, \
|                       virtual_free=6638.718750M,mem_used=1834.460938M, \
|                       swap_used=0.000000M,virtual_used=1834.460938M, \
|                       cpu=9.400000,m_topology=NONE,m_topology_inuse=NONE, \
|                       m_socket=0,m_core=0,np_load_avg=0.290000, \
|                       np_load_short=0.277500,np_load_medium=0.290000, \
|                       np_load_long=0.232500
| processors            4
| user_lists            NONE
| xuser_lists           NONE
| projects              NONE
| xprojects             NONE
| usage_scaling         NONE
| report_variables      NONE
| scfc at tools-bastion-01:~$

doesn't say anything about more virtual memory than real
memory (8 GByte) being provided, while on the other hand the
mem_free and virtual_free values do not correspond in any
way with:

| scfc at tools-webgrid-lighttpd-1409:~$ free -m
|              total       used       free     shared    buffers     cached
| Mem:          7985       7719        265        482        150       5728
| -/+ buffers/cache:       1840       6144
| Swap:          487          0        487
| scfc at tools-webgrid-lighttpd-1409:~$

Hmmm.

Tim




More information about the Labs-l mailing list