User disconnects

I am seeing some user disconnects on my configuration (ref https://community.jitsi.org/t/question-about-server-setup-for-school/).

And when that happens I see something like this on my nginx error log:

2021/01/28 13:32:28 [error] 800#800: *79198 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 1.1.1.1, server: foo.com, request: "POST /http-bind?room=foobar HTTP/1.1", upstream: "http://127.0.0.1:5280/http-bind?room=foobar", host: "foo.com", referrer: "https://foo.com/foobar"

What could an error like this be related to?

Kind regards,

Paulo Matos

nginx cannot get any response from prosody within timeout seconds. prosody logs may help

Oh - from prosody? I wouldn’t have expected that.
Prosody errors show nothing. The only thing I have there is:

Jan 28 07:53:26 portmanager	error	Error binding encrypted port for https: No certificate present in SSL/TLS configuration for https port 5281

Which doesn’t seem to make a difference.

On the side of normal logs, I see nothing worrying:

Jan 28 13:32:05 conference.foo.com:muc_domain_mapper	warn	Session filters applied
Jan 28 13:32:05 mod_bosh	info	New BOSH session, assigned it sid '60babc0c-a267-4164-89a7-b9f06fbb975e'
Jan 28 13:32:05 bosh60babc0c-a267-4164-89a7-b9f06fbb975e	info	Authenticated as shkuezoabcbp6hil@foo.com
Jan 28 13:32:13 conference.foo.com:muc_domain_mapper	warn	Session filters applied
Jan 28 13:32:13 mod_bosh	info	New BOSH session, assigned it sid '6254b273-d360-4c57-924a-773f579ef3c0'
Jan 28 13:32:14 bosh6254b273-d360-4c57-924a-773f579ef3c0	info	Authenticated as qg2sips4m5badtvx@foo.com
Jan 28 13:32:31 bosh1e9f32bc-1644-4e95-a792-32b1f3c6b4ed	info	BOSH client disconnected: session close
Jan 28 13:32:31 conference.foo.com:muc_domain_mapper	warn	Session filters applied
Jan 28 13:32:31 mod_bosh	info	New BOSH session, assigned it sid '10deb249-072b-4cdc-b122-0f3ff99ade6d'
Jan 28 13:32:32 bosh10deb249-072b-4cdc-b122-0f3ff99ade6d	info	Authenticated as velpwuekrqcjbcqf@foo.com
Jan 28 13:33:26 conference.foo.com:muc_domain_mapper	warn	Session filters applied
Jan 28 13:33:26 mod_bosh	info	New BOSH session, assigned it sid '6fc2d7b8-d00d-44c4-9320-50d580e08920'
Jan 28 13:33:27 bosh6fc2d7b8-d00d-44c4-9320-50d580e08920	info	Authenticated as pirvq7dlbb3nev20@foo.com
Jan 28 13:33:33 bosh6fc2d7b8-d00d-44c4-9320-50d580e08920	info	BOSH client disconnected: session close
Jan 28 13:33:43 speakerstats.foo.com:speakerstats_component	warn	A module has been configured that triggers external events.
Jan 28 13:33:43 speakerstats.foo.com:speakerstats_component	warn	Implement this lib to trigger external events.

This sounds like prosody overloads and doesn’t respond in time to the request, when this happens what is the cpu usage of the prosody process, but monitor that process cpu not the entire instance. The difference is that prosody is single threaded and monitoring the whole instance cpu will not show you correctly that prosody hits 100%

I have used

# ps -aux | grep prosody
prosody    740  0.6  0.1  56600 44132 ?        Ss   04:05   3:01 lua5.2 /usr/bin/prosody
root      3674  0.0  0.0   6144   824 pts/1    S+   12:12   0:00 grep prosody
# ps -p 740 -o %cpu,%mem,cmd
%CPU %MEM CMD
 0.6  0.1 lua5.2 /usr/bin/prosody

in a loop during a meeting but I see nothing close to 1.0 for %CPU.

Regarding timeouts I saw another one today where a student was dropped but nothing in prosody errors.
So the same timeout error I posted above shows up at the nginx log at 9:19:14. Although there’s nothing in prosody.err, there’s something odd in prosody.log, but I don’t know if it’s relevant:

Feb 02 09:19:04 bosh2c47a098-b2c2-4001-9172-eb408a5af7b4	info	Authenticated as _yqrr1qm-smncvop@foo.com
Feb 02 09:19:10 mod_bosh	info	Client tried to use sid '0e74dc08-531d-4764-85f1-f1d89d0f059c' which we don't know about
Feb 02 09:19:19 mod_bosh	info	Client tried to use sid '0e74dc08-531d-4764-85f1-f1d89d0f059c' which we don't know about
Feb 02 09:19:47 conference.meet.matos-sorge.com:muc_domain_mapper	warn	Session filters applied

So at second 10, it complains about an unknown sid. At second 14 there’s the timeout and at second 19 another complaint. No idea what’s going on.

I have checked another server with the same setup, and the same timeouts. There’s also a timeout but nothing on prosody logs, not even the unknown sid issue. Either I need to increase prosody log level or it’s something else causing the timeout.