I still haven't solved this issue and I'm really not sure where to go from
here. In order to make it easy for anyone to replicate the issue firsthand,
I've setup Vagrant to create two VMs and install/configure everything. To
do so, perform the following:
```
# Get the repo, using branch with Vagrant support
git clone -b 2vagrant
https://github.com/enterprisemediawiki/meza.git
cd meza
# Copy config file from default; edit to uncomment "app2" portion.
# Also increase RAM and CPUs as desired.
cp vagrantconf.default.yml vagrantconf.yml
$EDITOR vagrantconf.yml
# Setup the boxes (takes a few minutes)
vagrant up
# SSH into the primary box
vagrant ssh
# Install everything. Takes 20-40 minutes.
sudo meza deploy vagrant
```
If anything fails during deploy (sometimes GitHub hangs up, or other
intermittent errors) just rerun `sudo meza deploy vagrant`.
Once installed, you can go to
https://192.168.56.56/demo to access a wiki
(which is very slow in this multi-app config). FYI, the slow response times
make Visual Editor non-functional. User:Admin has password "adminpass". Go
to
http://192.168.56.56:8088 to access the XHGui profiler UI.
To stop using the second app server (and get better response times), edit
the inventory file: `sudo vim /opt/conf-meza/secret/vagrant/hosts`
And remove `192.168.56.57` from the `app-servers` section. Then re-run
deploy: `sudo meza deploy vagrant --skip-tags latest`
In case I have log and config files in unfamiliar places, see below.
* Apache: /etc/httpd/conf/httpd.conf, /var/log/httpd/access_log,
/var/log/httpd/error_log
* PHP: /etc/php.ini, /opt/data-meza/logs/php_errors.log
* Parsoid: /etc/parsoid/server.js.log, /etc/parsoid/localsettings.js
* HAProxy: /etc/haproxy/haproxy.cfg, /var/log/haproxy.log
* MariaDB: /opt/data-meza/mariadb/, /etc/my.cnf
Any help pointing me in the right direction would be truly appreciated.
I'll try to be on #wikimedia-tech throughout the day today.
Thanks in advance!
--James
On Tue, Jun 13, 2017 at 3:46 PM, James Montalvo <jamesmontalvo3(a)gmail.com>
wrote:
Some more data, in case anyone can help me figure this
out...
When running multiple app servers the request for
`load.php?debug=false&lang=en&modules=startup&only=scripts&skin=vector`
often was very long (10-30 seconds). So I did several requests for just
this (not as part of a page request) on both single-app-server and
double-app-server configurations and found no difference. The first request
on each took a long time (assuming building cache), but subsequent requests
all took an okay time (~1.7 seconds...not great but consistent across my
setups). So it doesn't appear to be something about that particular
request, but perhaps to do with the dynamics of multiple requests
occurring? Scott Ananian mentioned that it may be a lock issue, but I
haven't been able to find any info on this.
Next I tried just loading the Main Page in both 1-app and 2-app setups,
and pulled from the Apache logs the request time, whether it was app server
1 or 2, and the requested URL, and sorted it alphabetically (to make all
the requests line up). A graph of the times is at [1]. Raw data can be
found at [2]. It clearly shows that 5 of the 11 requests are significantly
slower with 2 app servers.
Can anyone help me with what might be going wrong, or how I could
troubleshoot this?
Thanks,
James
[1]
https://gist.github.com/jamesmontalvo3/5adf207623454c9eff98e
93152b43108/raw/89e567e238696c3cfaa3fb5ff1d987fda4d9f24c/
Comparison-of-Main-Page.png
[2]
https://gist.github.com/jamesmontalvo3/5adf207623454c9eff98e
93152b43108#file-comparison-of-main-page-md
On Mon, Jun 12, 2017 at 7:40 PM, James Montalvo <jamesmontalvo3(a)gmail.com>
wrote:
Follow on to previous email chain improperly
named "Setting up multiple
Parsoid servers behind load balancer".
I'm getting much slower response times in a setup with multiple app
servers behind an HAProxy load balancer, versus the same setup with just a
single app server behind the same load balancer. I've setup profiling per
recommendations from this mailing list. [1] is the call graph of a
particularly long request. [2] is a graph showing requests over many page
loads, with the better-performing yellow dots/line being the single app
server. The worst-performing color is with profiling turned on.
This gist [3] has my LocalSettings.php from both app servers and the
included Extensions.php.
Can anyone help me figure this out? Anything else I can provide or
certain things I should test?
Thanks,
James
[1]
https://gist.githubusercontent.com/jamesmontalvo3/5adf20
7623454c9eff98e93152b43108/raw/66612b7aac4fc3aee6287a64bf
e0566b30dc1e87/call-graph.png
[2]
https://gist.githubusercontent.com/jamesmontalvo3/5adf20
7623454c9eff98e93152b43108/raw/66612b7aac4fc3aee6287a64bf
e0566b30dc1e87/graph-of-response-times.png
[3]
https://gist.github.com/jamesmontalvo3/5adf207623454c9ef
f98e93152b43108