I’d like to clarify best practices of resolving connectivity issues between client app and JVB instance.
As a result (I hope) of this clarification there might be several pull requests with fixes from my side to Jitsi.
I little bit of background regarding my further questions.
I’m working on native mobile client app which uses WebRTC SDK for mobile under the hood.
I’m also have a tiny server app which translates client requests to create/join/hold/leave conference to corresponding requests to JVB instance.
Tiny server app runs JVB instance within same process, so server has direct access to all public interface of JVB.
Basically what tiny server does - it transforms incoming request with SDP from client to join call to corresponding request to JVB.
So, everything works as expected when network conditions are good, the fun starts when client loosing connectivity during the call or switch between networks or network become available or unavailable during a call.
There are several scenarios I’ve considered and tested as “real world” scenarios with mobile client.
Scenario 1 (works):
Let’s assume there is a client which has two network interfaces (could be 3G and WiFi or something else), so during call initiation WebRTC allocates ports on both of the interfaces, which results in reporting 2 local ICE candidates to client.
Now when such a client connects to JVB it able to establish 2 ICE connection with JVB.
Under the hood inside of JVB, ice4j Agent has received STUN ping requests from both of interfaces and sent responses, ice4j Agent also sent STUN ping requests back to both network interfaces.
Both client candidates considered “authorized” since that moment, because STUN ping “handshake” succeeded.
Then ice4j Agent is transited to terminated state after configurable timeout (3 seconds by default).
Since transition to terminated state Agent is no longer accept STUN pings from addresses which it hasn’t seen before, so no new “authorized” address will be discovered.
In that scenario when user looses connection on one of the network - data start flowing over another network - so no problem here, as long as network addresses does not change during a call.
Scenario 2 (does not work):
Consider it as an extension to Scenario 1.
Now client adds new network interface after call was established.
When WebRTC’s continual candidates gathering mode is enabled, WebRTC monitors network interfaces and able to discover new candidates on new interfaces during a call.
In this case 3-rd candidate is allocated on new interface and WebRTC attempts to send STUN ping to Jitsi to see if connection works.
According to Wireshark, ice4j Agent does respond to such STUN ping, but does not send STUN ping request back, because Agent is already terminated and this new 3-rd candidate will not be considered as “authorized”, because STUN “handshake” was not completed.
In this case if initial two candidates become unavailable and only new 3-rd one is actually connectable - the data will not flow between Client and JVB, because JVB will reject all traffic from “unauthorized” candidate, because there was incomplete STUN ping “handshake”
One workaround to this problem was found and it consists of delaying Agent termination “forever”.
In that case it makes ice4j agent compatible with continual gathering of WebRTC - data will flow seamlessly when client is switching between networks.
One problem with this approach is that ice4j Agent creates TerminationThread which is not properly killed, when “forever” timeout outlives Agent.
I’ve fixed this and propose pull request with fix https://github.com/jitsi/ice4j/pull/150, now if timeout is “forever” - termination thread is properly “cancelled”
I also have ongoing enhancement in this area, but what to start with a baby step of pull request #150.
Scenario 3 (does not work):
Consider that WebRTC continual gathering is not enabled (for example due to WebRTC coming from browser, not native SDK).
In such case only way known to me to handle network switch is to trigger “ice-restart”, because otherwise WebRTC will not attempt to allocated candidates on network interfaces which appeared after call established.
When ICE restart is triggered ICE candidate regathering happens on client as well as change to ICE pwd and ufrag attributes in SDP.
Because ICE pwd and ufrag were changed it is necessary to inform JVB about this change, otherwise ice4j Agent will not authorize STUN pings from new candidates.
I’ve done several attempts to make ice-restart work in JVB (via sending difference Colibri requests), none of them worked without local fixes in JVB itself (maybe I’ve used JVB api wrong, so that’s why I’m asking clarification about ways to handle these scenarios).
There attempts which was done to handle ice-restart coming from Client:
It was attempted to expire existing channels and transport then create new one with new channel bundle which updated ufrag and pwd.
This attempt was failed, because connection was still interrupted on WebRTC side - it received DTLS “disconnect” alert when existing channels were expired and WebRTC did not recover connection with new ice4j Agent. Maybe it’s a WebRTC bug in this case, maybe not, I haven’t investigated these deep.
Anyone know/experienced something like that with WebRTC + JVB?
It was also attempted to update ufrag and pwd of existing channel bundle via Colibri request, but currently JVB code is written in such a way it skip updating ufrag/pwd if agent is terminated (but for some reason it update fingerpint, which is not changed due ot ICE restart).
I’m not sure if it’s valid to update existing channel bundle pwd/ufrag to implement ice restart, could find information about proper way of implementing it.
Hope someone here will clarify “ice-restart” handling coming from client.
Attempt 3 (does work, but I really don’t like it):
It this case when network interruption is detected, “ice-restart” is not triggered on old peer connection, but old peer connection is closed and new one is created. Client initiates regular join request to tiny server, which creates corresponding request to JVB. During this join it is detected, that it is actually “re-join”, so old channels immediately expired and new channels and new channel bundle is created. In such case ICE connection is successfully established over currently (newly) available network interface. Basically it is almost the same as Attempt #1, but when instead of “ice-restart” new peer connection is created on client.
So, could please someone more experienced with Jitsi Videobridge clarify which of the scenarios is currently supported, and if they are supported, how they are implemented in Jitsi Meet + Jitsi Videobridge?
There are some existing topic here about the problem, but there is no indicated that issues were solved:
Thanks in advance,