Network capacity planning with TURN-only connectivity

Hi,

I have been following this thread on Maximum number of participants to understand capacity planning for higher number of concurrent users. I understand that we can achieve this by provisioning multiple videobridges. I also understand that we can exploit Octo’s SplitBridgeStrategy to split a conference call across multiple bridges even if they were in the same geographical region.

However, I’d like to understand capacity planning in a situation where all connections are essentially being relayed via a TURN server. Other than CPU, I understand JVB2 is also limited by network bandwidth. So assuming that:

  1. All participants are on Chrome web browsers (that support simulcast)
  2. All participants can have their audio and video on at the same time
  3. Each JVB is capable of supporting up to 100 participants with 8 vCPUs running at 70% at peak loads and 16Gbps network speed.
  4. Max participants per room can be effectively limited to <=35
  5. Total number of participants across multiple rooms is <=500, so I can provision 5 or 6 JVBs to support that capacity.

I have observed that in usual cases with max participants <100, my JVB uses around 20MB per concurrent user or 160Mbps. So to provision for max capacity of 100 concurrent users, I usually set up an instance with 16Gbps connectivity and that works very well.

However, I need to force all connections to use p2p, due to firewall restrictions. Thus, I have useStunTurn set to true for p2p as well as videobridge configuration in config.js. I have useTurnUdp set to false to force TCP-only connections to port 443 of my dedicated Coturn server. So I am assuming that every chunk of data is being relayed to and from the JVB and my TURN server in this setup.

In this situation, if I added more JVBs for higher capacity, I’d end up adding support for 16x5 = 90 Gbps network bandwidth at peak loads. But since I’m only using one TURN server at 16Gbps network, would this server become the bottleneck and not allow participants from connecting once 16Gbps was reached on TURN, even though I had capacity on JVB? If yes, then:

  1. Would provisioning multiple TURN servers solve the issue? Of course, I’ll need to update my prosody config’s turncredentials module to advertise all of my TURN server names to clients.
  2. Is there any additional configuration that would be needed?
  3. In case of auto-load-balanced or elastic setups, I’d either need to have all TURN servers running all the time, or add/remove them to prosody config anytime my TURN server count changed and reload prosody. Is there a way to reload Prosody config without restarting Prosody, as restarting would drop all ongoing calls?

My comments and questions are based on assumptions mentioned above. Please feel free to counter any of those. I’d greatly appreciate if someone could help me with accurate information on this topic.

Adding further to my findings on this topic.

Basis : We need all WebRTC communication to flow via a TURN relay because corporate firewalls won’t allow for UDP connectivity to JVB or STUN connectivity between p2p clients. Even if we had the conferences load balanced across multiple JVBs, if sufficient bandwidth is not allocated for our TURN relay, it will cause a bottleneck at the TURN server’s network interface.

Probable solution - Load-balance Coturn with multiple alternate servers behind a front-end TURN server

As per Coturn’s documentation on load-balancing, we can have multiple TURN servers set up and configured via alternate-server option. When a client tries to connect to the TURN relay:

  1. They will receive a 300 ALTERNATE SERVER code with an alternate server’s IP address
  2. The client is then expected to connect to this alternate server as its TURN relay

This will spread the load from multiple clients requesting a TURN relay, over a pool of alternate TURN servers.

Queries and concerns

  1. Is there any additional configuration required on Jitsi’s components (Prosody, JVB, Jicofo)?
  2. In our situation, lib-jitsi-meet is the TURN client which needs to handle and support a 300 ALTERNATE SERVER response from the TURN server. Is this something that’s already implemented?
  3. Is there someone who has tried load balancing multiple TURN servers with Jitsi? I wasn’t able to find any documentation or forum topics around it.

@damencho @saghul @emrah Requesting your attention here.

The usage of the turnserver is implemented in webrtc, so you need to check whether the browsers support it.

We were thinking to try this approach at some point, but we haven’t used it till now.

If you have enabled the prosody side to give the turnservers with the shared secret, nothing more is needed, the turn server is passed to the browser and that’s it.

Oh that’s great then! Thanks so much for confirming this. In our specific case, we can ensure that all our clients are on Chrome web browser. And as per this issue, Chrome added support for 300 ALTERNATE SERVER with v42, back in 2015.

I might need additional information on this one. I’m currently giving stun, turn, and turns hosts and ports along with shared secret in turncredentials. But I’m using just one TURN server at the moment. With multiple alternate servers, do I need Prosody to know the IP addresses of all alternate servers beforehand? I will of course configure alternate servers to use the same shared secret so that clients can connect to any alternate server with the same secret as received from Prosody.

Okay, got it. I will test this out and share findings as and when things progress. :slight_smile:

I suspect no, but I don’t know how that balancing works at all …

Thanks @damencho. I will try putting up a test strategy for this and report my findings here.

1 Like