Improve ICE restart handling

Dear Community,

I am writing this thread based in a conversation we had with @Pawel_Domas here https://github.com/jitsi/lib-jitsi-meet/pull/1149#issuecomment-673562315

I understand that while it is possible to enable ICE restart with the config enableIceRestart, it is not recommended to do so at the moment because of the following reason (quoting Pawel):

It was done by allocating new channels on the bridge which meant fresh start there, but the peerconnection on the client kept some state which lead to weird issues. You may want to check out this PR as it does the same thing, but with a new peerconnection and clear state on the client side. The next step will be to implement a real ice restart on the bridge.

The reason I ask is because in our case we are getting iceFailed errors very frequently, not for all our users but for specific ones, even if they have good internet connection (testing even with ethernet, no Wi-Fi). We would like to try enabling ICE restart.

Is the bridge ICE restart on the plans? Can we contribute somehow?

Thank you very much in advance.

2 Likes

Do you know what is causing the iceFailed ? Is it on mobile or when switching between the networks or do you observe that by tracking some analytics?

We are trying to find the cause. For these users it happens in Desktop, in Google Chrome version 84 (latest stable), in a desktop computer with only ethernet connection (without WiFi), without switching networks.

Do you recommend anything in particular to inspect on this case? Besides the lib-jitsi-meet console logs.

Does the failure occur initially or in the middle of the call?

After establishing a successful connection to the conference, it happens after some minutes (sometimes just a few minutes, sometimes after almost an hour). The users affected first appear with frozen video for others, and then after some seconds they get the ICE_FAILED error. After that we force a page reload, they connect again, and will be the same after a random amount of minutes.

Here is a sample of lib-jitsi-meet logs at the moment. In this case it happened right after changing receiver and sender video constraints, changing the selected endpoints and muting the local track:

VM2044 lib-jitsi-meet.min.js:10 2020-08-14T13:14:50.827Z [modules/RTC/BridgeChannel.js] <l.sendReceiverVideoConstraintMessage>:  sending a ReceiverVideoConstraint message with a maxFrameHeight of 360 pixels
VM2044 lib-jitsi-meet.min.js:10 2020-08-14T13:14:50.829Z [modules/xmpp/JingleSessionPC.js] <R.setSenderVideoConstraint>:  JingleSessionPC[p2p=false,initiator=false,sid=7tn9lac0qjunj] setSenderVideoConstraint: 180
VM2044 lib-jitsi-meet.min.js:10 2020-08-14T13:14:50.831Z [modules/RTC/TraceablePeerConnection.js] <R.setSenderVideoConstraint>:  Setting max height of 180 on local video
VM2044 lib-jitsi-meet.min.js:10 2020-08-14T13:14:50.834Z [modules/RTC/BridgeChannel.js] <l.sendSelectedEndpointsMessage>:  sending selected changed notification to the bridge for endpoints (6) ["a760f576", "044829d6", "b6177a3b", "a2c50741", "de1ae113", "72996077"]
VM2044 lib-jitsi-meet.min.js:10 2020-08-14T13:14:51.189Z [modules/RTC/JitsiLocalTrack.js] Mute LocalTrack[1,audio]: true
VM2044 lib-jitsi-meet.min.js:10 2020-08-14T13:15:07.524Z [modules/xmpp/JingleSessionPC.js] <R.peerconnection.oniceconnectionstatechange>:  (TIME) ICE failed P2P? false:	 2935599.5600000024
VM2044 lib-jitsi-meet.min.js:10 2020-08-14T13:15:07.525Z [modules/connectivity/IceFailedHandling.js] <d.start>:  ICE failed, but ICE restarts are disabled
VM2044 lib-jitsi-meet.min.js:10 2020-08-14T13:15:07.526Z [modules/connectivity/IceFailedHandling.js] <c._maybeSetDelayTimeout>:  Will emit ICE failed in 15000ms

We did not find yet a specific pattern leading to the issue. As a side note, in our implementation we combine using JitsiConference.setLastN in combination with JitsiConference.selectParticipants, to get only the video feeds that the user is seeing on his browser viewport.

This is odd. If you’re able to repro I would try to confirm if there’s traffic received on the bridge at the time when it happens. Is there firewall or VPN involved?

Good question. We confirmed that users behind a firewall get this error more frequently, however it also happens from time to time to users that are not behind a firewall.

We added more information and logs in this thread Users disconnected (not ports or turn related) initially we thought it was not related to this (that’s why we opened a separate thread), but then we found that it was related to ICE Failed. We started testing enableIceRestart: true after pulling latest lib-jitsi-meet one week ago.

We saw this too (client logs), it seems to be trying to reconnect but the connection does not recover:

VM59 lib-jitsi-meet.min.js:10 2020-09-03T19:10:18.102Z [modules/xmpp/JingleSessionPC.js] <A.peerconnection.oniceconnectionstatechange>:  (TIME) ICE failed P2P? false:	 98918.84499997832
VM59 lib-jitsi-meet.min.js:10 2020-09-03T19:10:20.228Z [modules/connectivity/IceFailedHandling.js] <a._actOnIceFailed>:  ICE failed, enableIceRestart: true, supports restart by terminate: true
VM59 lib-jitsi-meet.min.js:10 2020-09-03T19:10:20.228Z [modules/connectivity/IceFailedHandling.js] <a._actOnIceFailed>:  Sending ICE failed - the connection did not recover, ICE state: failed, use 'session-terminate': false

ICE restarts are now working for us. We had a problem of Bridge channel closed not being opened again, but after we migrated from SCTP data channel to websockets for communication with the bridge, that was fixed. The user experience is improved on ICE restart, he will see all videos on interrupted state for some seconds (and the other parties will see him interrupted) but then it recovers and he continues to be in the meeting without needing to disconnect and reconnect.