Fragmentation, latency for geographically dispersed users. Out of phase / lost in 5th dimension Settings Tweak without OCTO?

Greetings,

We’re having a bit of a challenge here.

If all of our users are from North America, with the Jitsi server in the N. Virginia AWS AZ, and everyone is in the same room together, typically 10 to 25 people per meeting, the setup works find, without any errors generating in any of the logs (monitoring logs: nginx, prosody, jicofo, and jvb).

If all of our users are in India, with the Jitsi server in N. Virginia AWS AZ, there are some complaints in the logs about insufficient bandwidth (some of the India users “only” have around 0.5-3 Mbps connection upstreams), and Jitsi keeps trying to adjust accordingly, and while not great, it is passable.

However, if we have some/most users in N. America, and add even one user from India (and it gets worse the more users from India added to the mix), then we start getting fragmentation / lost packet errors, and eventually users start ending up “out of phase”, users have been calling it “lost in the 5th dimension”.

Bit by bit some users begin to be unable to hear other one or more of the others in the room.

Even with just one user in India joining, at first one of the N. America users can’t hear another N. America user, then it might become two that can’t hear one, and over about 10-15 minutes it becomes several people cannot hear one or more N. America users.

Interestingly it is not necessarily the India user that they can’t hear. This gets exponentially worse with each India user that joins.

The only temporary fix to hear everyone again is either everyone leaves the room at once, refreshes their browsers and logs back into the room, and/or rebooting the server/services. But that only helps for a few minutes before the out of phase issues build up again.

Is there something fundamentally wrong with our setup? This could be an issue if you think this is not behaving as expected under these circumstances, I welcome any suggestions you may have. Or am
misinterpreting what the log are saying? (entirely possible, so I appreciate your expert help)

I am hoping we can reduce this issue this through configuration improvements. Is that a possibility?

CPU, RAM, storage, and network loads are nothing (less than 3%) on the c5.xlarge setup. Testing currently with all-in-one setup, with the JVB on the same server as the JMS. We have had even worse fragmentation with separate JVBs, so I was trying to simplify the setup to track this down, then I’ll re-attach the external JVBs.

We’re not yet able to go to an Octo and geographically distributed JVB setup, and I am assuming that would be he best method to address this?

Failing being able to go with an Octo and distributed JVB setup, what settings could I tweak on the JMS/JVB/nginx/meet/prosody/jicofo or JVB configs that would make the setup more resilient to geographically dispersed (high latency) users joining in without causing such severe fragmentation?

I haven’t ruled out other problems with our setup, and appreciate any suggestions folks may have for addressing this weird but frequent issue that makes our setup unusable across continents alas.

Thank you for any suggestions!

Some log excepts that might be helpful:

JVB 2022-02-28 14:29:14.061 WARNING: [294] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient.startCheckForPair#374: Failed to send BINDING-REQUEST(0x1)[attrib.count=6 len=92 tranID=0xCD42BC407F01C7917096F0D8] java.lang.IllegalArgumentException: No socket found for 10.6.101.51:10000/udp->172.16.50.173:60641/udp at org.ice4j.stack.NetAccessManager.sendMessage(NetAccessManager.java:631) at org.ice4j.stack.NetAccessManager.sendMessage(NetAccessManager.java:581) at org.ice4j.stack.StunClientTransaction.sendRequest0(StunClientTransaction.java:267) at org.ice4j.stack.StunClientTransaction.sendRequest(StunClientTransaction.java:245) at org.ice4j.stack.StunStack.sendRequest(StunStack.java:680) at org.ice4j.ice.ConnectivityCheckClient.startCheckForPair(ConnectivityCheckClient.java:335) at org.ice4j.ice.ConnectivityCheckClient.startCheckForPair(ConnectivityCheckClient.java:231) at org.ice4j.ice.ConnectivityCheckClient$PaceMaker.run(ConnectivityCheckClient.java:938) at org.ice4j.util.PeriodicRunnable.executeRun(PeriodicRunnable.java:206) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) JVB 2022-02-28 14:29:14.061 INFO: [294] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient$PaceMaker.run#942: Pair failed: 10.6.101.51:10000/udp/host → 172.16.50.173:60641/udp/host (stream-f1cc751c.RTP) JVB 2022-02-28 14:29:14.061 WARNING: [294] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient.startCheckForPair#374: Failed to send BINDING-REQUEST(0x1)[attrib.count=6 len=92 tranID=0xCD42BC407F019692876A5FC0] java.lang.IllegalArgumentException: No socket found for 10.6.101.51:10000/udp->192.168.80.1:60642/udp at org.ice4j.stack.NetAccessManager.sendMessage(NetAccessManager.java:631) at org.ice4j.stack.NetAccessManager.sendMessage(NetAccessManager.java:581) at org.ice4j.stack.StunClientTransaction.sendRequest0(StunClientTransaction.java:267) at org.ice4j.stack.StunClientTransaction.sendRequest(StunClientTransaction.java:245) at org.ice4j.stack.StunStack.sendRequest(StunStack.java:680) at org.ice4j.ice.ConnectivityCheckClient.startCheckForPair(ConnectivityCheckClient.java:335) at org.ice4j.ice.ConnectivityCheckClient.startCheckForPair(ConnectivityCheckClient.java:231) at org.ice4j.ice.ConnectivityCheckClient$PaceMaker.run(ConnectivityCheckClient.java:938) at org.ice4j.util.PeriodicRunnable.executeRun(PeriodicRunnable.java:206) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) JVB 2022-02-28 14:29:14.061 INFO: [294] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient$PaceMaker.run#942: Pair failed: 10.6.101.51:10000/udp/host → 192.168.80.1:60642/udp/host (stream-f1cc751c.RTP) JVB 2022-02-28 14:29:14.071 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient.processSuccessResponse#648: Pair succeeded: 10.6.101.51:10000/udp/host → 96.79.202.21:2077/udp/prflx (stream-f1cc751c.RTP). JVB 2022-02-28 14:29:14.071 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX componentId=1 conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e name=stream-f1cc751c epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ComponentSocket.addAuthorizedAddress#99: Adding allowed address: 96.79.202.21:2077/udp
JVB 2022-02-28 14:29:14.071 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient.processSuccessResponse#657: Pair validated: 3.223.105.157:10000/udp/srflx → 96.79.202.21:2077/udp/prflx (stream-f1cc751c.RTP). JVB 2022-02-28 14:29:14.071 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] DefaultNominator.strategyNominateFirstHostOrReflexiveValid#268: Nominate (first highest valid): 3.223.105.157:10000/udp/srflx → 96.79.202.21:2077/udp/prflx (stream-f1cc751c.RTP) JVB 2022-02-28 14:29:14.071 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] Agent.nominate#1812: verify if nominated pair answer again JVB 2022-02-28 14:29:14.071 WARNING: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX componentId=1 conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e name=stream-f1cc751c epId=f1cc751c local_ufrag=iqhs1ft0bog2e] MergingDatagramSocket.initializeActive#599: Active socket already initialized. JVB 2022-02-28 14:29:14.071 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient.processSuccessResponse#726: IsControlling: true USE-CANDIDATE:false. JVB 2022-02-28 14:29:14.167 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient.processSuccessResponse#648: Pair succeeded: 3.223.105.157:10000/udp/srflx → 96.79.202.21:2077/udp/prflx (stream-f1cc751c.RTP). JVB 2022-02-28 14:29:14.167 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient.processSuccessResponse#657: Pair validated: 3.223.105.157:10000/udp/srflx → 96.79.202.21:2077/udp/prflx (stream-f1cc751c.RTP). JVB 2022-02-28 14:29:14.167 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient.processSuccessResponse#726: IsControlling: true USE-CANDIDATE:true. JVB 2022-02-28 14:29:14.167 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e epId=f1cc751c local_ufrag=iqhs1ft0bog2e] ConnectivityCheckClient.processSuccessResponse#741: Nomination confirmed for pair: 3.223.105.157:10000/udp/srflx → 96.79.202.21:2077/udp/prflx (stream-f1cc751c.RTP). JVB 2022-02-28 14:29:14.167 INFO: [229] [confId=d4c454a410c5e7cb gid=54692 stats_id=Johan-PJX conf_name=test1@conference.aws-testing.testing.com ufrag=iqhs1ft0bog2e name=stream-f1cc751c epId=f1cc751c local_ufrag=iqhs1ft0bog2e] CheckList.handleNominationConfirmed#406:

TransportCcEngine.tccReceived#171: TCC packet contained received sequence numbers: 8545-8546, 8548-8549. Couldn’t find packet detail for the seq nums: 8545-8546, 8548-8549. Latest seqNum was 9768, size is 1000. Latest RTT is 3667.262801 ms. JVB 2022-02-28 14:27:51.375 WARNING: [96] [confId=d4c454a410c5e7cb epId=11503899 gid=54692 stats_id=Cassandre-aMu conf_name=test1@conference.aws-testing.testing.com] TransportCcEngine.tccReceived#171: TCC packet contained received sequence numbers: 8551-8553. Couldn’t find packet detail for the seq nums: 8551-8553. Latest seqNum was 9777, size is 1000. Latest RTT is 431.427608 ms. JVB 2022-02-28 14:27:51.632 WARNING: [104] [confId=d4c454a410c5e7cb epId=11503899 gid=54692 stats_id=Cassandre-aMu conf_name=test1@conference.aws-testing.testing.com] TransportCcEngine.tccReceived#171: TCC packet contained received sequence numbers: 8556-8613. Couldn’t find packet detail for the seq nums: 8556-8613. Latest seqNum was 9777, size is 1000. Latest RTT is 431.427608 ms. JVB 2022-02-28 14:27:52.378 WARNING: [99] [confId=d4c454a410c5e7cb epId=11503899 gid=54692 stats_id=Cassandre-aMu conf_name=test1@conference.aws-testing.testing.com] TransportCcEngine.tccReceived#171: TCC packet contained received sequence numbers: 8614-8885. Couldn’t find packet detail for the seq nums: 8614-8787. Latest seqNum was 9787, size is 1000. Latest RTT is 3673.230287 ms. JVB 2022-02-28 14:27:57.381 INFO: [33] HealthChecker.run#171: Performed a successful health check in PT0.000006S. Sticky failure: false JVB 2022-02-28 14:28:05.311 INFO: [261] [confId=d4c454a410c5e7cb gid=54692 conf_name=test1@conference.aws-testing.testing.com] Conference.recentSpeakersChanged#549: Recent speakers changed: [7296fe85, 173ce26a, f9d30c8f, 3d1ab5ea, 0b057f71, 91487f41, 11503899], dominant speaker changed: true JVB 2022-02-28 14:28:07.382 INFO: [33] HealthChecker.run#171: Performed a successful health check in PT0.000002S. Sticky failure: false JVB 2022-02-28 14:28:07.413 INFO: [296] [confId=d4c454a410c5e7cb gid=54692 conf_name=test1@conference.aws-testing.testing.com] Conference.recentSpeakersChanged#549: Recent speakers changed: [173ce26a, 7296fe85, f9d30c8f, 3d1ab5ea, 0b057f71, 91487f41, 11503899], dominant speaker changed: true JVB 2022-02-28 14:28:17.381 INFO: [33] HealthChecker.run#171: Performed a successful health check in PT0.000001S. Sticky failure: false

Let me know if you need any config information posted a well.
Thanks kindly!

Versions information:
Setup in AWS environment. All components (Nginx, Prosody, JMS, Jicofo, JVB) on single c5.4xl server with EIP.
Linux = Ubuntu 18.04.6 LTS
jitsi-meet = 2.0.6865-2
jicofo = 1.0-846-1
jitsi-meet-prosody = 1.0.5818-1
jitsi-meet-tokens = 1.0.5818-1
jitsi-meet-turnserver = 1.0.5818-1
jitsi-meet-web= 1.0.5818-1
jitsi-meet-web-config = 1.0.5818-1
jitsi-videobridge2=2.1-617-ga8b39c3f-1
prosody=0.11.13-1~bionic1
nginx = 1.14.0-0ubuntu1.9
Java =OpenJDK 64-Bit Server VM (build 11.0.13+8-Ubuntu-0ubuntu1.18.04, mixed mode, sharing) (we had been at Java 8, but multiple recommendations to go to Java 11, we tried that and it did not make any difference for this particular issue, do you recommend we go back to Java 8, or is it okay to stay with Java 11 now?).

(we can’t do “latest version of Jitsi” easily, because of other apps dependencies in development that often break with Jitsi upgrades, it can take us many sprints to upgrade to a newer Jitsi version and not break the dependent apps, so unless there is a specific known bug for this version that may be causing this issue, recommending “upgrade to latest version” is very problematic. Thanks kindly).

Really hoping for some performance tuning general information that could impact handling different latencies between users to make JItsi more tolerant of those with high latency mixed into same room with those with low latencies (N. America participants and India participants).

You may create a similar meeting on meet.jit.si to see if it’s a setup issue or something to do with Jitsi

But isn’t meet.jit.si setup with distributed infrastructure @emrah ?

I believe the Team stopped using Octo at one point (don’t know if they’ve brought it back on meet.jit.si). So, what you’re looking to do may not be different from the way meet.jit.si is set up right now.

Do you have TURN deployed? Are you behind a NAT?

This is interesting information. How load is shared on different jvb without octo for large conference.

Meet.jit.si uses octo locally at the moment, to accommodate big conferences. But this may change soon with introducing of secure octo.

1 Like

We tried a 45 minute meeting on meet.jit.si. With between 12 to 15 people. Only a handful turned on their cameras alas (many of them are shy).
Obviously I couldn’t monitor the server logs. In the browser developer tools console I only once or twice saw the plan b errors I saw more frequently in ours (though not constantly in ours). We did not experience going “out of phase” however. Again, I don’t know if that was a good apples to apples test.

Thinking broadly, can you think of which settings might be the mot likely to cause the fragmented packets and out of phase type issues, so I can see which configs to share with you, and we can see what tweaks might be helpful?
If you have specific configs you would like to see, please let me know which ones (there are so many, I didn’t want to flood you with too many).
Thanks kindly!

  • what is your jitsi version?

  • is there any problem when all participants use Firefox as client?

  • Is there any participant using an old version of Chrome or (if your Jitsi version is older than the unified-plan change) is there any participant using the current stable Chrome?

Version information reposted at end of this reply.
We mandated everyone only use latest Chrome, since too many user had problems with Firefox and Safari.
We have not tried with everyone using Firefox. Is Firefox now recommended over Chrome? (that is a big change if so).
We’ll double check again. A few months ago, during the Plan B change, we told everyone to make sure they upgraded, but I will double check with everyone at tomorrow test to make sure someone hasn’t been out of the loop there.

Versions information:
Setup in AWS environment. All components (Nginx, Prosody, JMS, Jicofo, JVB) on single c5.4xl server with EIP.
Linux = Ubuntu 18.04.6 LTS
jitsi-meet = 2.0.6865-2
jicofo = 1.0-846-1
jitsi-meet-prosody = 1.0.5818-1
jitsi-meet-tokens = 1.0.5818-1
jitsi-meet-turnserver = 1.0.5818-1
jitsi-meet-web= 1.0.5818-1
jitsi-meet-web-config = 1.0.5818-1
jitsi-videobridge2=2.1-617-ga8b39c3f-1
prosody=0.11.13-1~bionic1
nginx = 1.14.0-0ubuntu1.9
Java =OpenJDK 64-Bit Server VM (build 11.0.13+8-Ubuntu-0ubuntu1.18.04, mixed mode, sharing) (we had been at Java 8, but multiple recommendations to go to Java 11, we tried that and it did not make any difference for this particular issue, do you recommend we go back to Java 8, or is it okay to stay with Java 11 now?).

(we can’t do “latest version of Jitsi” easily, because of other apps dependencies in development that often break with Jitsi upgrades, it can take us many sprints to upgrade to a newer Jitsi version and not break the dependent apps, so unless there is a specific known bug for this version that may be causing this issue, recommending “upgrade to latest version” is very problematic. Thanks kindly).

Really hoping for some performance tuning general information that could impact handling different latencies between users to make JItsi more tolerant of those with high latency mixed into same room with those with low latencies (N. America participants and India participants).

We rolled back our setup to very stock all-in-one single server with no customization and no token access.

We tested it twice and it does not have this fragmentation problem.

We are trying to figure out what could be the possible causes, having to add one by one additional changes until the bug recurs. Would really appreciate some suggestions for this type of error to help narrow down the possibilities. Without any suggestions, I am worried it will takes weeks to whittle away at this. Any suggestions which specific components/settings could trigger such a problem? Or is this a new one unseen before in the community?
Thanks for any suggestions!