Ah! I had been so focused on watching the Jitsi and Prosody logs I had stopped looking at the nginx ones:
=> /var/log/nginx/error.log <==
2021/05/26 00:49:19 [alert] 757#757: *14448 768 worker_connections are not enough while connecting to upstream, client: 3.141.39.164, server: lmtgt1.dev2dev.net, request: "GET /xmpp-websocket?room=loadtest49 HTTP/1.1", upstream: "http://127.0.0.1:5280/xmpp-websocket?prefix=&room=loadtest49", host: "lmtgt1.dev2dev.net", referrer: "https://lmtgt1.dev2dev.net/loadtest49"
==> /var/log/nginx/access.log <==
3.141.39.164 - - [26/May/2021:00:49:19 +0000] "GET /xmpp-websocket?room=loadtest49 HTTP/1.1" 500 600 "https://lmtgt1.dev2dev.net/loadtest49" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36"
==> /var/log/nginx/error.log <==
2021/05/26 00:49:19 [alert] 757#757: *14449 768 worker_connections are not enough while connecting to upstream, client: 3.133.120.158, server: lmtgt1.dev2dev.net, request: "GET /xmpp-websocket?room=loadtest17 HTTP/1.1", upstream: "http://127.0.0.1:5280/xmpp-websocket?prefix=&room=loadtest17", host: "lmtgt1.dev2dev.net"
==> /var/log/nginx/access.log <==
3.133.120.158 - - [26/May/2021:00:49:19 +0000] "GET /xmpp-websocket?room=loadtest17 HTTP/1.1" 500 600 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36"
18.223.241.112 - - [26/May/2021:00:49:19 +0000] "GET /loadtest41 HTTP/1.1" 200 20996 "https://lmtgt1.dev2dev.net/loadtest41" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36"
Edited /etc/nginx/nginx.conf from 768 workers to 2000…
events {
worker_connections 2000; #increased by Hawke for larger capacity scaling
# multi_accept on;
}
restarted nginx, restarted load test…
That immediately made it so that 10 attendees in the rooms (Shoud be 12 in loadtest0 and 10 in the rest).
Saw the following errors now in nginx:
2021/05/26 00:54:36 [alert] 5460#5460: *10586 socket() failed (24: Too many open files) while connecting to upstream, client: 13.58.225.151, server: lmtgt1.dev2dev.net, request: "GET /colibri-ws/default-id/88005dd23b712f1b/c52b7bdc?pwd=ug4v1o6dc548rccjeu55t4efn HTTP/1.1", upstream: "http://127.0.0.1:9090/colibri-ws/default-id/88005dd23b712f1b/c52b7bdc?pwd=ug4v1o6dc548rccjeu55t4efn", host: "lmtgt1.dev2dev.net"
2021/05/26 00:54:36 [alert] 5460#5460: *10587 socket() failed (24: Too many open files) while connecting to upstream, client: 18.222.20.104, server: lmtgt1.dev2dev.net, request: "GET /colibri-ws/default-id/2c374845c031e5f2/86976c5e?pwd=4gdmuef8v6pqbfh71lnlqsam0p HTTP/1.1", upstream: "http://127.0.0.1:9090/colibri-ws/default-id/2c374845c031e5f2/86976c5e?pwd=4gdmuef8v6pqbfh71lnlqsam0p", host: "lmtgt1.dev2dev.net"
2021/05/26 00:54:36 [alert] 5460#5460: *10588 socket() failed (24: Too many open files) while connecting to upstream, client: 18.188.6.169, server: lmtgt1.dev2dev.net, request: "GET /colibri-ws/default-id/9849ca00fb60edb7/8617953d?pwd=50i3c8esh847ttrkndqh471mk0 HTTP/1.1", upstream: "http://127.0.0.1:9090/colibri-ws/default-id/9849ca00fb60edb7/8617953d?pwd=50i3c8esh847ttrkndqh471mk0", host: "lmtgt1.dev2dev.net"
2021/05/26 00:54:36 [crit] 5462#5462: accept4() failed (24: Too many open files)
Edited /etc/security/limits.conf
to add the following:
nginx soft nofile 30000
nginx hard nofile 50000
edited (again) /etc/nginx/nginx.conf to increase workers, and add rlimit from above number.
events {
worker_connections 2000;
# multi_accept on;
worker_rlimit_nofile 300000
}
rebooted
now seeing this in nginx logs:
HTTP/1.1", upstream: "http://127.0.0.1:5280/xmpp-websocket?prefix=&room=loadtest81", host: "lmtgt1.dev2dev.net"
2021/05/26 00:59:08 [error] 5463#5463: *9913 recv() failed (104: Connection reset by peer) while proxying upgraded connection, client: 3.133.129.17, server: lmtgt1.dev2dev.net, request: "GET /xmpp-websocket?room=loadtest0 HTTP/1.1", upstream: "http://127.0.0.1:5280/xmpp-websocket?prefix=&room=loadtest0", host: "lmtgt1.dev2dev.net"
2021/05/26 01:04:47 [error] 5459#5459: *34 recv() failed (104: Connection reset by peer) while proxying upgraded connection, client: 96.79.202.21, server: lmtgt1.dev2dev.net, request: "GET /colibri-ws/default-id/d74ad537a372a336/0c18074b?pwd=6nrfkf9ohkbfj6mbkvhr7k2slo HTTP/1.1", upstream: "http://127.0.0.1:9090/colibri-ws/default-id/d74ad537a372a336/0c18074b?pwd=6nrfkf9ohkbfj6mbkvhr7k2slo", host: "lmtgt1.dev2dev.net"
2021/05/26 01:04:47 [error] 5459#5459: *30 recv() failed (104: Connection reset by peer) while proxying upgraded connection, client: 96.79.202.21, server: lmtgt1.dev2dev.net, request: "GET /colibri-ws/default-id/d74ad537a372a336/8831aa6c?pwd=6qfum0gc1uu9ubkl35fk1c3b2h HTTP/1.1", upstream: "http://127.0.0.1:9090/colibri-ws/default-id/d74ad537a372a336/8831aa6c?pwd=6qfum0gc1uu9ubkl35fk1c3b2h", host: "lmtgt1.dev2dev.net"
2021/05/26 01:05:18 [emerg] 591#591: unexpected "}" in /etc/nginx/nginx.conf:10
and jitsi is running but I can’t connect to it via web…
ah, some things in the wrong place and missing semicolon, cleaned up looks like this now in nginx.conf:
events {
worker_connections 2000;
# multi_accept on;
}
http {
##
# Basic Settings
##
sendfile on;
tcp_nopush on;
…
Okay, now nginx working again, and no errors yet (Before next load test).
Jitsi log no errors with the 2 laptop users connected.
start load test of 950 (+2) users… on this m5a.4xl (32cpu 64gb ram) single instance running all core jitsi services, no add-ons…
running load, 10 participants per room, see that 1 is sending video clearly and smoothly, and loadtest0 has 12 people because of the two laptops both sending audio and video successfully in that room… so far no users dropping
jitsi and nginx log files calm still…
video remains clear, smooth, steady…
and 5 minute load test ends without a glitch!
YES! SUCCESS AT LAST! (at least for this hurdle, onward to the next ).
Thank you so very much for your help. Greatly appreciated!
I hope my overhshare step by step here helps out any others that run into anything similar in the future.
Now I just need AWS to raise that limit from 1k to 5k spot instances, and then can try the 5k load test.
Meanwhile, now need to increase the pressure with the 1k users, add more simultaneous video senders, add audio, add CC, add recording, etc. Can do a lot at this level for now. Onward and forward.
Thank you very much again @damencho arking this as solved shortly!