JVB Endpoints suspended due to insufficient bandwidth only 5 to 15 users - AWS c5 2xl & 4xl instances Manual Setup (not dockerized)

Oooooh my bad! :man_facepalming:t6:

Correct @damencho sorry it wasn’t clear (edited to try to improve clarity).

@rpgresearch what about the browsers?

All running Chrome. I believe with the exception of one Mac user and linux user (the latter being me, but I also ran a Windows system as well), all should be the same as this: 95.0.4638.54
Grepping the nginx access.log I do see some accessing with version 94.0.4606.81
Is that enough to make a difference?
I’m pinging everyone to remind everyone they actually need to be on the latest version.

Are you able to constantly reproduce this?

Yes, it is consistently recurring when we get between 5 to 15 participants. By 15 the messages are flying by in the jvb.log.
At the same time I’m watching the logs for:
Prosody, jicrofo, nginx, OS, AWS cloudwatch CPU and Network. Only obvious errors I’ve seen so far are the jvb errors.

Every meeting we have seen this since setting up fresh instances with this newer Jitsi setup a week or so ago. I’m trying to translate all of the optimization settings from the older version as discussed here:

into the newer Jitsi configs and syntax.
I’m crawling through each config today seeing what changes might help.
Definitely appreciate suggestions specific to the new setup with JVB specific to this issue.
Thanks kindly!

What was the version you upgraded from? The previous stable?

@damencho
We did not upgrade anything as far as these server setup.
We were on other servers using the much older Jitsi 1 variants elsewhere.
These are all fresh server instances.

I just tested again, slowly adding people.

Still using same jitsi versions as listed at the top of this post.

I have started fiddling with config files on the JVB so far (haven’t touched the JMS instance with nginx, prosody, jicofo, etc. yet).

With 3 users, logs looked like this (1 windows, 2 linux):
JVB 2021-10-27 19:00:32.198 INFO: [23] HealthChecker.run#171: Performed a successful health check in PT0S. Sticky failure: false
JVB 2021-10-27 19:00:33.357 INFO: [63] [confId=504f27000457d3c6 gid=35721 stats_id=Skyla-CIu conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com ufrag=1h57m1fj1hevqf epId=76e99c8d local_ufrag=1h57m1fj1hevqf] ConnectivityCheckClient.processTimeout#874: timeout for pair: 3.209.52.119:10000/udp/srflx → 96.79.202.21:55024/udp/prflx (stream-76e99c8d.RTP), failing.
JVB 2021-10-27 19:00:33.446 INFO: [63] [confId=504f27000457d3c6 gid=35721 stats_id=Godfrey-QeG conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com ufrag=fi51c1fj1hev2v epId=e1ff5bbe local_ufrag=fi51c1fj1hev2v] ConnectivityCheckClient.processTimeout#874: timeout for pair: 3.209.52.119:10000/udp/srflx → 96.79.202.21:17113/udp/prflx (stream-e1ff5bbe.RTP), failing.
JVB 2021-10-27 19:00:36.357 INFO: [63] [confId=504f27000457d3c6 gid=35721 stats_id=Skyla-CIu conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com ufrag=1h57m1fj1hevqf epId=76e99c8d local_ufrag=1h57m1fj1hevqf] ConnectivityCheckClient.processTimeout#874: timeout for pair: 3.209.52.119:10000/udp/srflx → 96.79.202.21:55024/udp/prflx (stream-76e99c8d.RTP), failing.
JVB 2021-10-27 19:00:36.446 INFO: [63] [confId=504f27000457d3c6 gid=35721 stats_id=Godfrey-QeG conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com ufrag=fi51c1fj1hev2v epId=e1ff5bbe local_ufrag=fi51c1fj1hev2v] ConnectivityCheckClient.processTimeout#874: timeout for pair: 3.209.52.119:10000/udp/srflx → 96.79.202.21:17113/udp/prflx (stream-e1ff5bbe.RTP), failing.
JVB 2021-10-27 19:00:39.357 INFO: [63] [confId=504f27000457d3c6 gid=35721 stats_id=Skyla-CIu conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com ufrag=1h57m1fj1hevqf epId=76e99c8d local_ufrag=1h57m1fj1hevqf] ConnectivityCheckClient.processTimeout#874: timeout for pair: 3.209.52.119:10000/udp/srflx → 96.79.202.21:55024/udp/prflx (stream-76e99c8d.RTP), failing.
JVB 2021-10-27 19:00:39.446 INFO: [63] [confId=504f27000457d3c6 gid=35721 stats_id=Godfrey-QeG conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com ufrag=fi51c1fj1hev2v epId=e1ff5bbe local_ufrag=fi51c1fj1hev2v] ConnectivityCheckClient.processTimeout#874: timeout for pair: 3.209.52.119:10000/udp/srflx → 96.79.202.21:17113/udp/prflx (stream-e1ff5bbe.RTP), failing.
JVB 2021-10-27 19:00:42.198 INFO: [23] HealthChecker.run#171: Performed a successful health check in PT0S. Sticky failure: false
JVB 2021-10-27 19:00:42.358 INFO: [63] [confId=504f27000457d3c6 gid=35721 stats_id=Skyla-CIu conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com ufrag=1h57m1fj1hevqf epId=76e99c8d local_ufrag=1h57m1fj1hevqf] ConnectivityCheckClient.processTimeout#874: timeout for pair: 3.209.52.119:10000/udp/srflx → 96.79.202.21:55024/udp/prflx (stream-76e99c8d.RTP), failing.
JVB 2021-10-27 19:00:42.447 INFO: [63] [confId=504f27000457d3c6 gid=35721 stats_id=Godfrey-QeG conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com ufrag=fi51c1fj1hev2v epId=e1ff5bbe local_ufrag=fi51c1fj1hev2v] ConnectivityCheckClient.processTimeout#874: timeout for pair: 3.209.52.119:10000/udp/srflx → 96.79.202.21:17113/udp/prflx (stream-e1ff5bbe.RTP), failing.
JVB 2021-10-27 19:00:45.358 INFO: [63] [confId=504f27000457d3c6 gid=35721 stats_id=Skyla-CIu conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com ufrag=1h57m1fj1hevqf epId=76e99c8d local_ufrag=1h57m1fj1hevqf] ConnectivityCheckClient.processTimeout#874: timeout for pair: 3.209.52.119:10000/udp/srflx → 96.79.202.21:55024/udp/prflx (stream-76e99c8d.RTP), failing.
JVB 2021-10-27 19:00:45.447 INFO: [63] [confId=504f27000457d3c6 gid=35721 stats_id=Godfrey-QeG conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com ufrag=fi51c1fj1hev2v epId=e1ff5bbe local_ufrag=fi51c1fj1hev2v] ConnectivityCheckClient.processTimeout#874: timeout for pair: 3.209.52.119:10000/udp/srflx → 96.79.202.21:17113/udp/prflx (stream-e1ff5bbe.RTP), failing.
JVB 2021-10-27 19:00:45.588 INFO: [88] [confId=3073b8355bc2766a epId=1aa421d6 gid=26600 stats_id=Muriel-S2R conf_name=mqpsuimfxy_162@conference.uat-aws-ediolivemeet.myedio.com] AbstractEndpoint.expire#250: Expiring.
JVB 2021-10-27 19:00:45.588 INFO: [87] [confId=3073b8355bc2766a gid=26600 conf_name=mqpsuimfxy_162@conference.uat-aws-ediolivemeet.myedio.com] Conference.recentSpeakersChanged#425: Recent speakers changed:
JVB 2021-10-27 19:00:45.588 INFO: [88] [confId=3073b8355bc2766a epId=1aa421d6 gid=26600 stats_id=Muriel-S2R conf_name=mqpsuimfxy_162@conference.uat-aws-ediolivemeet.myedio.com] Endpoint.expire#1030: Spent 0 seconds oversending
JVB 2021-10-27 19:00:45.588 INFO: [88] [confId=3073b8355bc2766a epId=1aa421d6 gid=26600 stats_id=Muriel-S2R conf_name=mqpsuimfxy_162@conference.uat-aws-ediolivemeet.myedio.com] Transceiver.teardown#324: Tearing down
JVB 2021-10-27 19:00:45.588 INFO: [88] [confId=3073b8355bc2766a epId=1aa421d6 gid=26600 stats_id=Muriel-S2R conf_name=mqpsuimfxy_162@conference.uat-aws-ediolivemeet.myedio.com] RtpReceiverImpl.tearDown#339: Tearing down
JVB 2021-10-27 19:00:45.589 INFO: [88] [confId=3073b8355bc2766a epId=1aa421d6 gid=26600 stats_id=Muriel-S2R conf_name=mqpsuimfxy_162@conference.uat-aws-ediolivemeet.myedio.com] RtpSenderImpl.tearDown#315: Tearing down
JVB 2021-10-27 19:00:45.589 INFO: [88] [confId=3073b8355bc2766a epId=1aa421d6 gid=26600 stats_id=Muriel-S2R conf_name=mqpsuimfxy_162@conference.uat-aws-ediolivemeet.myedio.com] DtlsTransport.stop#186: Stopping
JVB 2021-10-27 19:00:45.589 INFO: [88] [confId=3073b8355bc2766a epId=1aa421d6 local_ufrag=38acf1fj1hvpdd gid=26600 stats_id=Muriel-S2R conf_name=mqpsuimfxy_162@conference.uat-aws-ediolivemeet.myedio.com] IceTransport.stop#241: Stopping
JVB 2021-10-27 19:00:45.590 INFO: [88] [confId=3073b8355bc2766a gid=26600 stats_id=Muriel-S2R componentId=1 conf_name=mqpsuimfxy_162@conference.uat-aws-ediolivemeet.myedio.com ufrag=38acf1fj1hvpdd name=stream-1aa421d6 epId=1aa421d6 local_ufrag=38acf1fj1hvpdd] MergingDatagramSocket.close#142: Closing.
JVB 2021-10-27 19:00:45.590 INFO: [43] [confId=3073b8355bc2766a epId=1aa421d6 local_ufrag=38acf1fj1hvpdd gid=26600 stats_id=Muriel-S2R conf_name=mqpsuimfxy_162@conference.uat-aws-ediolivemeet.myedio.com] IceTransport.startReadingData#207: Socket closed, stopping reader
JVB 2021-10-27 19:00:45.591 INFO: [83] [confId=3073b8355bc2766a gid=26600 stats_id=Muriel-S2R componentId=1 conf_name=mqpsuimfxy_162@conference.uat-aws-ediolivemeet.myedio.com ufrag=38acf1fj1hvpdd name=stream-1aa421d6 epId=1aa421d6 local_ufrag=38acf1fj1hvpdd] MergingDatagramSocket$SocketContainer.runInReaderThread#770: Failed to receive: java.net.SocketException: Socket closed
JVB 2021-10-27 19:00:45.591 INFO: [43] [confId=3073b8355bc2766a epId=1aa421d6 local_ufrag=38acf1fj1hvpdd gid=26600 stats_id=Muriel-S2R conf_name=mqpsuimfxy_162@conference.uat-aws-ediolivemeet.myedio.com] IceTransport.startReadingData#219: No longer running, stopped reading pac

Adding 4th and 5th once again generated insufficient bandwidth errors like this:
JVB 2021-10-27 20:38:27.765 INFO: [62] [confId=332ba00033162184 epId=7618b560 gid=56339 stats_id=Skyla-CIu conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com] BandwidthAllocator.allocate#326: Endpoints were suspended due to insufficient bandwidth (bwe=78225 bps): 2ad65f56,ef95d3d1,d50cc52b,fedc745d
JVB 2021-10-27 20:38:28.382 INFO: [23] HealthChecker.run#171: Performed a successful health check in PT0S. Sticky failure: false
JVB 2021-10-27 20:38:28.826 INFO: [65] [confId=332ba00033162184 epId=7618b560 gid=56339 stats_id=Skyla-CIu conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com] BandwidthAllocator.allocate#326: Endpoints were suspended due to insufficient bandwidth (bwe=93322 bps): 2ad65f56,ef95d3d1,fedc745d
JVB 2021-10-27 20:38:30.223 INFO: [62] [confId=332ba00033162184 epId=d50cc52b gid=56339 stats_id=Elmore-xV9 conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com] BandwidthAllocator.allocate#326: Endpoints were suspended due to insufficient bandwidth (bwe=78303 bps): 2ad65f56,ef95d3d1,fedc745d,7618b560
JVB 2021-10-27 20:38:30.293 INFO: [62] [confId=332ba00033162184 epId=fedc745d gid=56339 stats_id=Godfrey-QeG conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com] BandwidthAllocator.allocate#326: Endpoints were suspended due to insufficient bandwidth (bwe=50051 bps): 2ad65f56,ef95d3d1,d50cc52b,7618b560
JVB 2021-10-27 20:38:30.925 INFO: [62] [confId=332ba00033162184 epId=7618b560 gid=56339 stats_id=Skyla-CIu conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com] BandwidthAllocator.allocate#326: Endpoints were suspended due to insufficient bandwidth (bwe=110931 bps): ef95d3d1,d50cc52b,fedc745d
JVB 2021-10-27 20:38:31.272 INFO: [65] [confId=332ba00033162184 epId=d50cc52b gid=56339 stats_id=Elmore-xV9 conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com] BandwidthAllocator.allocate#326: Endpoints were suspended due to insufficient bandwidth (bwe=93412 bps): 2ad65f56,fedc745d,7618b560
JVB 2021-10-27 20:38:31.390 INFO: [65] [confId=332ba00033162184 epId=fedc745d gid=56339 stats_id=Godfrey-QeG conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com] BandwidthAllocator.allocate#326: Endpoints were suspended due to insufficient bandwidth (bwe=60459 bps): 2ad65f56,ef95d3d1,d50cc52b,7618b560
JVB 2021-10-27 20:38:32.975 INFO: [65] [confId=332ba00033162184 epId=7618b560 gid=56339 stats_id=Skyla-CIu conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com] BandwidthAllocator.allocate#326: Endpoints were suspended due to insufficient bandwidth (bwe=93309 bps): ef95d3d1,d50cc52b,fedc745d
JVB 2021-10-27 20:38:33.466 INFO: [64] [confId=332ba00033162184 epId=fedc745d gid=56339 stats_id=Godfrey-QeG conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com] BandwidthAllocator.allocate#326: Endpoints were suspended due to insufficient bandwidth (bwe=72600 bps): 2ad65f56,ef95d3d1,d50cc52b,7618b560
JVB 2021-10-27 20:38:35.332 INFO: [66] [confId=332ba00033162184 epId=7618b560 gid=56339 stats_id=Skyla-CIu conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com] BandwidthAllocator.allocate#326: Endpoints were suspended due to insufficient bandwidth (bwe=108964 bps): ef95d3d1,d50cc52b,fedc745d
JVB 2021-10-27 20:38:35.538 INFO: [63] [confId=332ba00033162184 epId=fedc745d gid=56339 stats_id=Godfrey-QeG conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com] BandwidthAllocator.allocate#326: Endpoints were suspended due to insufficient bandwidth (bwe=86761 bps): ef95d3d1,d50cc52b,7618b560
JVB 2021-10-27 20:38:36.180 INFO: [69] [confId=332ba00033162184 epId=7618b560 gid=56339 stats_id=Skyla-CIu conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com] BandwidthAllocator.allocate#326: Endpoints were suspended due to insufficient bandwidth (bwe=91882 bps): ef95d3d1,d50cc52b,fedc745d
JVB 2021-10-27 20:38:37.598 INFO: [68] [confId=332ba00033162184 epId=fedc745d gid=56339 stats_id=Godfrey-QeG conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com] BandwidthAllocator.allocate#326: Endpoints were suspended due to insufficient bandwidth (bwe=103278 bps): ef95d3d1,d50cc52b,7618b560
JVB 2021-10-27 20:38:38.180 INFO: [68] [confId=332ba00033162184 epId=7618b560 gid=56339 stats_id=Skyla-CIu conf_name=rcknpypha8_91817@conference.uat-aws-ediolivemeet.myedio.com] BandwidthAllocator.allocate#326: Endpoints were suspended due to insufficient bandwidth (bwe=107345 bps): ef95d3d1,d50cc52b,fedc745d
JVB 2021-10-27 20:38:38.383 INFO: [23] HealthChecker.run#171: Performed a successful health check in PT0S. Sticky failure: false

jvb.conf before only contained this:
videobridge {
http-servers {
public {
port = 9090
}
}
websockets {
enabled = true
domain = “uat-aws-ediolivemeet.myedio.com:443
tls = true
}
cc {
trust-bwe = false
}
health {
interval = 3640000
}
}

Ans was having the same errors.
I’ve add a ton of potential options from the list of options to see if could find something more useful, but so far only added issues (not surprisingly) and not solved the bandwidth issue yet.

I basically started adding most of what was listed here (leaving out Octo and such):

Mostly set to false (disabled). Basically shotgunning right now since not getting any other information from the logs that helps. Appreciate suggestions for targeted bits to fiddle with. Thanks!

I’m planning to fiddle with the nginx and prosody throughput settings next. Unless other suggestions?

Having problems with bandwidth is a direct connection between the client and the bridge.

Why do you run with this? That is supposed to stop the bandwidth estimations and to risk overshooting links which will lead to a really bad experience … and the strange thing is you are experiencing the opposite … video suspended due to detected low bandwidth

This was setup by someone else, I’m trying to fix it. They had that turned off to see if it would help. I have turned it back on since. I was showing you how basic their config was to start with.

take a look at this post
It’s hardly conclusive, but I’m beginning to think that JVB may be too trusting by default in browser bandwidth information, maybe there are spikes in this reported bandwidth and fiddling with these parameters could help jvb to smooth them and not overreact. You see, having great bandwidth on the Internet is not having a great real-time connection. You can have 100Mbit/s et yet having during this second (an eternity for a modern computer) a little period of 10ms when connectivity is 0. And also, browsers are not real time operating systems. Updates to statistics may not be always done correctly when it’s loaded. Maybe disabling sound indicators could lead to unexpected results in your problem.

@rpgresearch Can you try the same scenario same people on meet.jit.si and if you reproduce it, can you send me in private message the meeting link and approximate time of it, this may help to debug it. Thanks.

Will do.

I have also, just to rule out any spikes in bandwidth, cpu, or ram that we might not have been catching in cloudwatch to:
JMS from c5a.4xlarge to c5n.9xlarge
JVB raised up to c5n.4xlarge
Jibri raised up to c5a.4xlarge

That way there are plenty of resources available with the performance tuning tweaks.

I am still adding all of the tuning tweaks to this setup, from what I learned from load testing of 2,500+ users on one server back in the June as per what I learned here: Hitting hard limit around 600 participants, then start dropping constantly suggestions? - #11 by rpgresearch

In a few hours we’re trying a test with only USA people to rule out the India distance issue (since we’re not setting up a JVB in India (plus octo, etc.)) (even though they all had good results on test.webrtc.org and such.

We’re also trying to rule out any potential VPN factors.

If we still see the same problems after these changes are finished, and these tests rule out the other issues, and still see the bandwidth issues, then I’ll have them hit the public server and let you know how that goes. I’ll update here how each test goes. Thanks @damencho !

Hmm, I notice they set this to the same setting for both:
org.ice4j.ice.harvest.NAT_HARVESTER_LOCAL_ADDRESS=central.internal.uat.uat-aws-ediolivemeet.myedio.com

org.ice4j.ice.harvest.NAT_HARVESTER_PUBLIC_ADDRESS=central.internal.uat.uat-aws-ediolivemeet.myedio.com

in /etc/jitsi/videobridge/sip-communicator.properties.

Is that going to cause issues? Shouldn’t the LOCAL and PUBLIC be different to work correctly? (AWS private IP and AWS Elastic IP) ?

Yes, those need to be the IP addresses. The issue will be not able to establish media with the bridge. Maybe it has also the setting about the stun mapping harvester that overrides these?

1 Like

I will put those on the tweak list next. Thanks!

We completed a USA only participants meeting earlier today, included with a range of the tweaks I made from the link above. We did not see a single insufficient bandwidth issues error, though had different issues we’ll address separate.

We’ll test again on this same setup in a few hours to once again include folks from India again, and if the insufficient bandwidth issue returns, then we’ll assume it is the geographic distance. It is not in the scope of this particular setup to setup a localized JVB in the India area, plus Octo, etc. Most of the participants for this setup are East coast planned.

However, we’re having other issues with browser and OS compatibility issues, and some room desyncing of some users as per here: Participants get out of sync with other participants in same room, have to refresh browser (rejoin room) as workaround. What is proper fix? for the all USA participants.

I’ll post what I find with the India participants in a few hours.

Regards.