Greetings,
We’re having a bit of a challenge here.
If all of our users are from North America, with the Jitsi server in the N. Virginia AWS AZ, and everyone is in the same room together, typically 10 to 25 people per meeting, the setup works find, without any errors generating in any of the logs (monitoring logs: nginx, prosody, jicofo, and jvb).
If all of our users are in India, with the Jitsi server in N. Virginia AWS AZ, there are some complaints in the logs about insufficient bandwidth (some of the India users “only” have around 0.5-3 Mbps connection upstreams), and Jitsi keeps trying to adjust accordingly, and while not great, it is passable.
However, if we have some/most users in N. America, and add even one user from India (and it gets worse the more users from India added to the mix), then we start getting fragmentation / lost packet errors, and eventually users start ending up “out of phase”, users have been calling it “lost in the 5th dimension”.
Bit by bit some users begin to be unable to hear other one or more of the others in the room.
Even with just one user in India joining, at first one of the N. America users can’t hear another N. America user, then it might become two that can’t hear one, and over about 10-15 minutes it becomes several people cannot hear one or more N. America users.
Interestingly it is not necessarily the India user that they can’t hear. This gets exponentially worse with each India user that joins.
The only temporary fix to hear everyone again is either everyone leaves the room at once, refreshes their browsers and logs back into the room, and/or rebooting the server/services. But that only helps for a few minutes before the out of phase issues build up again.
Is there something fundamentally wrong with our setup? This could be an issue if you think this is not behaving as expected under these circumstances, I welcome any suggestions you may have. Or am
misinterpreting what the log are saying? (entirely possible, so I appreciate your expert help)
I am hoping we can reduce this issue this through configuration improvements. Is that a possibility?
CPU, RAM, storage, and network loads are nothing (less than 3%) on the c5.xlarge setup. Testing currently with all-in-one setup, with the JVB on the same server as the JMS. We have had even worse fragmentation with separate JVBs, so I was trying to simplify the setup to track this down, then I’ll re-attach the external JVBs.
We’re not yet able to go to an Octo and geographically distributed JVB setup, and I am assuming that would be he best method to address this?
Failing being able to go with an Octo and distributed JVB setup, what settings could I tweak on the JMS/JVB/nginx/meet/prosody/jicofo or JVB configs that would make the setup more resilient to geographically dispersed (high latency) users joining in without causing such severe fragmentation?
I haven’t ruled out other problems with our setup, and appreciate any suggestions folks may have for addressing this weird but frequent issue that makes our setup unusable across continents alas.
Thank you for any suggestions!
Some log excepts that might be helpful:
JVB 2022-02-28 14:29:14.061 WARNING: [294] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient.startCheckForPair#374: Failed to send BINDING-REQUEST(0x1)[attrib.count=6 len=92 tranID=0xCD42BC407F01C7917096F0D8] java.lang.IllegalArgumentException: No socket found for 10.6.101.51:10000/udp->172.16.50.173:60641/udp at org.ice4j.stack.NetAccessManager.sendMessage(NetAccessManager.java:631) at org.ice4j.stack.NetAccessManager.sendMessage(NetAccessManager.java:581) at org.ice4j.stack.StunClientTransaction.sendRequest0(StunClientTransaction.java:267) at org.ice4j.stack.StunClientTransaction.sendRequest(StunClientTransaction.java:245) at org.ice4j.stack.StunStack.sendRequest(StunStack.java:680) at org.ice4j.ice.ConnectivityCheckClient.startCheckForPair(ConnectivityCheckClient.java:335) at org.ice4j.ice.ConnectivityCheckClient.startCheckForPair(ConnectivityCheckClient.java:231) at org.ice4j.ice.ConnectivityCheckClient$PaceMaker.run(ConnectivityCheckClient.java:938) at org.ice4j.util.PeriodicRunnable.executeRun(PeriodicRunnable.java:206) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) JVB 2022-02-28 14:29:14.061 INFO: [294] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient$PaceMaker.run#942: Pair failed: 10.6.101.51:10000/udp/host → 172.16.50.173:60641/udp/host (stream-f1cc751c.RTP) JVB 2022-02-28 14:29:14.061 WARNING: [294] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient.startCheckForPair#374: Failed to send BINDING-REQUEST(0x1)[attrib.count=6 len=92 tranID=0xCD42BC407F019692876A5FC0] java.lang.IllegalArgumentException: No socket found for 10.6.101.51:10000/udp->192.168.80.1:60642/udp at org.ice4j.stack.NetAccessManager.sendMessage(NetAccessManager.java:631) at org.ice4j.stack.NetAccessManager.sendMessage(NetAccessManager.java:581) at org.ice4j.stack.StunClientTransaction.sendRequest0(StunClientTransaction.java:267) at org.ice4j.stack.StunClientTransaction.sendRequest(StunClientTransaction.java:245) at org.ice4j.stack.StunStack.sendRequest(StunStack.java:680) at org.ice4j.ice.ConnectivityCheckClient.startCheckForPair(ConnectivityCheckClient.java:335) at org.ice4j.ice.ConnectivityCheckClient.startCheckForPair(ConnectivityCheckClient.java:231) at org.ice4j.ice.ConnectivityCheckClient$PaceMaker.run(ConnectivityCheckClient.java:938) at org.ice4j.util.PeriodicRunnable.executeRun(PeriodicRunnable.java:206) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) JVB 2022-02-28 14:29:14.061 INFO: [294] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient$PaceMaker.run#942: Pair failed: 10.6.101.51:10000/udp/host → 192.168.80.1:60642/udp/host (stream-f1cc751c.RTP) JVB 2022-02-28 14:29:14.071 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient.processSuccessResponse#648: Pair succeeded: 10.6.101.51:10000/udp/host → 96.79.202.21:2077/udp/prflx (stream-f1cc751c.RTP). JVB 2022-02-28 14:29:14.071 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX componentId=1 conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e name=stream-f1cc751c epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ComponentSocket.addAuthorizedAddress#99: Adding allowed address: 96.79.202.21:2077/udp
JVB 2022-02-28 14:29:14.071 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient.processSuccessResponse#657: Pair validated: 3.223.105.157:10000/udp/srflx → 96.79.202.21:2077/udp/prflx (stream-f1cc751c.RTP). JVB 2022-02-28 14:29:14.071 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] DefaultNominator.strategyNominateFirstHostOrReflexiveValid#268: Nominate (first highest valid): 3.223.105.157:10000/udp/srflx → 96.79.202.21:2077/udp/prflx (stream-f1cc751c.RTP) JVB 2022-02-28 14:29:14.071 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] Agent.nominate#1812: verify if nominated pair answer again JVB 2022-02-28 14:29:14.071 WARNING: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX componentId=1 conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e name=stream-f1cc751c epId=f1cc751c local_ufrag=iqhs1ft0bog2e] MergingDatagramSocket.initializeActive#599: Active socket already initialized. JVB 2022-02-28 14:29:14.071 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient.processSuccessResponse#726: IsControlling: true USE-CANDIDATE:false. JVB 2022-02-28 14:29:14.167 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient.processSuccessResponse#648: Pair succeeded: 3.223.105.157:10000/udp/srflx → 96.79.202.21:2077/udp/prflx (stream-f1cc751c.RTP). JVB 2022-02-28 14:29:14.167 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient.processSuccessResponse#657: Pair validated: 3.223.105.157:10000/udp/srflx → 96.79.202.21:2077/udp/prflx (stream-f1cc751c.RTP). JVB 2022-02-28 14:29:14.167 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient.processSuccessResponse#726: IsControlling: true USE-CANDIDATE:true. JVB 2022-02-28 14:29:14.167 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient.processSuccessResponse#741: Nomination confirmed for pair: 3.223.105.157:10000/udp/srflx → 96.79.202.21:2077/udp/prflx (stream-f1cc751c.RTP). JVB 2022-02-28 14:29:14.167 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e name=stream-f1cc751c epId=f1cc751c local_ufrag=iqhs1ft0bog2e] CheckList.handleNominationConfirmed#406:
TransportCcEngine.tccReceived#171: TCC packet contained received sequence numbers: 8545-8546, 8548-8549. Couldn’t find packet detail for the seq nums: 8545-8546, 8548-8549. Latest seqNum was 9768, size is 1000. Latest RTT is 3667.262801 ms. JVB 2022-02-28 14:27:51.375 WARNING: [96] [confId=d4c454a410c5e7cb epId=11503899 gid=54692 stats_id=Cassandre-aMu conf_name=test1@conference.aws-testing.testing.com] TransportCcEngine.tccReceived#171: TCC packet contained received sequence numbers: 8551-8553. Couldn’t find packet detail for the seq nums: 8551-8553. Latest seqNum was 9777, size is 1000. Latest RTT is 431.427608 ms. JVB 2022-02-28 14:27:51.632 WARNING: [104] [confId=d4c454a410c5e7cb epId=11503899 gid=54692 stats_id=Cassandre-aMu conf_name=test1@conference.aws-testing.testing.com] TransportCcEngine.tccReceived#171: TCC packet contained received sequence numbers: 8556-8613. Couldn’t find packet detail for the seq nums: 8556-8613. Latest seqNum was 9777, size is 1000. Latest RTT is 431.427608 ms. JVB 2022-02-28 14:27:52.378 WARNING: [99] [confId=d4c454a410c5e7cb epId=11503899 gid=54692 stats_id=Cassandre-aMu conf_name=test1@conference.aws-testing.testing.com] TransportCcEngine.tccReceived#171: TCC packet contained received sequence numbers: 8614-8885. Couldn’t find packet detail for the seq nums: 8614-8787. Latest seqNum was 9787, size is 1000. Latest RTT is 3673.230287 ms. JVB 2022-02-28 14:27:57.381 INFO: [33] HealthChecker.run#171: Performed a successful health check in PT0.000006S. Sticky failure: false JVB 2022-02-28 14:28:05.311 INFO: [261] [confId=d4c454a410c5e7cb gid=54692 conf_name=test1@conference.aws-testing.testing.com] Conference.recentSpeakersChanged#549: Recent speakers changed: [7296fe85, 173ce26a, f9d30c8f, 3d1ab5ea, 0b057f71, 91487f41, 11503899], dominant speaker changed: true JVB 2022-02-28 14:28:07.382 INFO: [33] HealthChecker.run#171: Performed a successful health check in PT0.000002S. Sticky failure: false JVB 2022-02-28 14:28:07.413 INFO: [296] [confId=d4c454a410c5e7cb gid=54692 conf_name=test1@conference.aws-testing.testing.com] Conference.recentSpeakersChanged#549: Recent speakers changed: [173ce26a, 7296fe85, f9d30c8f, 3d1ab5ea, 0b057f71, 91487f41, 11503899], dominant speaker changed: true JVB 2022-02-28 14:28:17.381 INFO: [33] HealthChecker.run#171: Performed a successful health check in PT0.000001S. Sticky failure: false
Let me know if you need any config information posted a well.
Thanks kindly!
Versions information:
Setup in AWS environment. All components (Nginx, Prosody, JMS, Jicofo, JVB) on single c5.4xl server with EIP.
Linux = Ubuntu 18.04.6 LTS
jitsi-meet = 2.0.6865-2
jicofo = 1.0-846-1
jitsi-meet-prosody = 1.0.5818-1
jitsi-meet-tokens = 1.0.5818-1
jitsi-meet-turnserver = 1.0.5818-1
jitsi-meet-web= 1.0.5818-1
jitsi-meet-web-config = 1.0.5818-1
jitsi-videobridge2=2.1-617-ga8b39c3f-1
prosody=0.11.13-1~bionic1
nginx = 1.14.0-0ubuntu1.9
Java =OpenJDK 64-Bit Server VM (build 11.0.13+8-Ubuntu-0ubuntu1.18.04, mixed mode, sharing) (we had been at Java 8, but multiple recommendations to go to Java 11, we tried that and it did not make any difference for this particular issue, do you recommend we go back to Java 8, or is it okay to stay with Java 11 now?).
(we can’t do “latest version of Jitsi” easily, because of other apps dependencies in development that often break with Jitsi upgrades, it can take us many sprints to upgrade to a newer Jitsi version and not break the dependent apps, so unless there is a specific known bug for this version that may be causing this issue, recommending “upgrade to latest version” is very problematic. Thanks kindly).
Really hoping for some performance tuning general information that could impact handling different latencies between users to make JItsi more tolerant of those with high latency mixed into same room with those with low latencies (N. America participants and India participants).