Bridge Data Channel stays closed and cannot reopen it

Hello Community,

We have an issue where Bridge Data Channel stays closed and not being able to reopen it.

In our case we have our own deployment with kubernetes, with multiple JVB (using StatefulSet like here https://github.com/hpi-schul-cloud/jitsi-deployment/blob/master/base/jitsi-shard/jvb/jvb-statefulset.yaml).

When one of the JVB pods dies, we can see that clients reconnect automatically to one of the other JVBs available. By debugging the JS we can see that this line is executed https://github.com/jitsi/lib-jitsi-meet/blob/master/modules/xmpp/strophe.jingle.js#L205, and BridgeChannel is initialized again, but immediately fails with error undefined:

Logger.js:154 2020-08-25T23:05:18.284Z [modules/RTC/BridgeChannel.js] <RTCDataChannel.e.onerror>:  Channel error: undefined
Logger.js:154 2020-08-25T23:05:18.285Z [modules/RTC/BridgeChannel.js] <RTCDataChannel.e.onclose>:  Channel closed by server
Logger.js:154 2020-08-25T23:05:18.573Z [modules/RTC/BridgeChannel.js] <l._send>:  Bridge Channel send: no opened channel.
Logger.js:154 2020-08-25T23:05:29.001Z [modules/RTC/BridgeChannel.js] <l._send>:  Bridge Channel send: no opened channel.

It is worth noting that the candidates on the initial logging of SDP is like this:

2020-08-25T23:00:40.015Z [modules/RTC/TraceablePeerConnection.js] <A.trace>:  getRemoteDescription::preTransform type: offer
...
a=candidate:1 1 udp 2130706431 <local_IP> 32101 typ host generation 0
a=candidate:2 1 udp 1694498815 <public_JVB_IP> 32101 typ srflx raddr <local_IP> rport 32101 generation 0

But after connecting to the failover JVB, we get these listed:

a=candidate:1 1 udp 2130706431 <local_IP> 32100 typ host generation 0
a=candidate:2 1 udp 1694498815 <public_JVB_IP> 32100 typ srflx raddr <local_IP> rport 32100 generation 0
a=candidate:1 1 udp 2130706431 <local_IP> 32101 typ host generation 0
a=candidate:2 1 udp 1694498815 <public_JVB_IP> 32101 typ srflx raddr <local_IP> rport 32101 generation 0
a=candidate:1 1 udp 2130706431 <local_IP> 32101 typ host generation 0
a=candidate:2 1 udp 1694498815 <public_JVB_IP> 32101 typ srflx raddr <local_IP> rport 32101 generation 0

Which makes us think that it is still having the old one in port 32101 as candidate and that might be why opening the data channel fails. At this point clients can see each other’s video, and by inspecting in Wireshark we can see that they are connected via UDP to the bridge in port 32100.

We tried to do this manual call to initialize the channel again like this:

conference.rtc.initializeBridgeChannel(conference.getActivePeerConnection(), null);

And it fails with the same error. We don’t have any event that we can listen to know when the data channel was closed, which makes us harder to detect when this happens in our application, resulting in clients that cannot exchange data messages with the bridge.

Do you know how could we overcome this problem?

2 Likes