While load testing with MalleusJitsificus, we noticed that websocket connections to
/xmpp-websocket sometimes fail when a moderate number (>30) of new participants are connecting at about the same time.
Observations when running malleus.sh with
--join-delay=100 --conferences=1 --participants=50:
a few of the participants will fail with “org.openqa.selenium.WebDriverException: unknown error: net::ERR_CONNECTION_CLOSED”.
In nginx error logs, there are indications that prosody is sometimes not accepting connections from nginx. So out of the 50 participant connections, we might see 2 or 3 rejections with:
... recv() failed (104: Connection reset by peer) while proxying upgraded connection, client: 10.1.28.158, server: meet.mydomain, request: "GET /xmpp-websocket?room=loadtest0 HTTP/1.1", upstream: "http://127.0.0.1:5280/xmpp-websocket?prefix=&room=loadtest0", host: " meet.mydomain"
Nothing unusual in prosody logs as far as I can tell
If I join the loadtest room manually at about the same time, I sometimes see the dreaded “Connection Error” page but with nothing in console logs. All works as usual if I reload the page after a few seconds.
CPU usage of prosody process remains relatively low throughout.
The errors don’t occur if we add a long delay between joins, e.g.
--join-delay=3000, so the issue appears to not be down to the number of active connections but concurrent new connects.
Any idea what could be amiss, or how I can investigate further?