Load Balancing Coturn in Oracle Cloud

I’m looking into moving our Jitsi deployment to Oracle Cloud starting with the coturn servers, and I’m hoping for some advice from others who’ve been down this road :slight_smile:

What’s the best way to load balance an autoscaled pool of coturn servers?

How does meet.jit.si/jaas do it?

I’m considering the following 2 approaches (but open to other suggestions):

  1. Using DNS Round Robin
    • How do you add/remove records as instances are scaled in/out?
  2. Fronting the instance pool with a layer-4 Network Load Balancer
    • this appears to work when I tested this manually, but in OCI I can’t seem to attach an NLB to an instance pool (only allows layer-7 Load Balancers). Programmatically adding/removing instances to the NLB backend sets does not seem like an appealing option
    • does anyone use NLB for coturn? any issues when used with Jitsi considering there are several caveats when using coturn with a Network Load balancer?

All comments and suggestions greatly appreciated.

We are using 1, I think. But @Aaron_K_van_Meerten can confirm this :slight_smile:

Thanks @damencho (and welcome back!) :slight_smile:

@Aaron_K_van_Meerten any chance you could offer some guidance on how that’s set up?
Also, if it is not too much to ask, how are you handling graceful scale-in of JVBs considering there is no equivalent for Autoscaling Lifecycle Hooks in Oracle Cloud?

We have built some automation around advertising coturn servers once they come up. We’re still using Route53 in AWS for some DNS, so in this case records also have health checks associated as well. When we scale down we simply remove the DNS well in advance before taking it down. In practice we don’t autoscale the coturn servers very often, as we generally have a predictable load.

As far as JVBs go, we use our autoscaler service (GitHub - jitsi/jitsi-autoscaler: Jitsi Autoscaler microservice) to handle scale up and termination of JVBs. When a JVB is selected for termination, the JVB also is set into graceful shutdown mode, so no new traffic should go there. Once it’s empty, the instance is self-terminated. There’s room to do this same thing in AWS autoscaler groups or OCI instance pools, with some additional logic to simple detach the instance when selected for termination. This should ensure the JVB is no longer considered in further scaling decisions while it drains.

That’s helpful. Many thanks!

I’m evaluating both others right now (DNS Round Robin, and Network Load Balancer); DNS Round Robin works but requires custom automation to set up, and from preliminary tests a layer-4 NLB appears to work too.

The uncertainty I have with NLB are these caveat from the coturn wiki which influences how routing has to be configured (or could rule out NLB):

There are two cases when different TURN sessions must interact: RTP/RTCP connection pairs (from RFC 5766) and TCP relay (from RFC 6062).

and

Also, if you are using the mobile TURN (from the new MICE draft) then you cannot use the network load balancer option because client sessions from different IP addresses must interact

Is either of the statements above relevant to Jitsi?

@Aaron_K_van_Meerten - Thank you for sharing!

To ensure graceful termination of JVBs in OCI using Autoscaling, what I can currently think of is:

  • Add the event “terminate instance begin” to the Instance Pool.
  • Issue an Oracle Cloud command to execute a script in the instance which will detach the instance from Autoscaling Instance Pool and put the instance in “graceful shutdown” mode to ensure no new conferences are sent to the instance.
  • Finally terminate the instance after all the conferences in the JVB are over.

But a question which I could not find answer for yet is - How long does Oracle Cloud wait for the termination script to finish execution once “terminate begin” event is triggered? Does it wait for few hours? Because, the existing conferences in JVB might continue for few more hours before it can be gracefully terminated.

Any help on this is highly appreciated! Thank you.

I use this script to scale-down JVBs. JVB instances are in LXC container and they have a special SSH port in my implementation but it’s possible to use a similar method with some changes.

It checks JVBs periodicly, keeps some idle JVBs and shutdown the other idle JVBs

Thank you @emrah
Are you using this script on top of Oracle Cloud Autoscaling? Or are you managing the whole scale up and scale down yourself?

We are planning to use OCI Autoscaling since we can avoid the overhead of managing scale up and scale down, replacing unhealthy instances, etc.

But, we are not able to figure out how to implement graceful termination of JVBs while using OCI Autoscaling?

In AWS, we can add lifecycle hooks, and use AWS SQS or Lambda to run graceful termination script, wait for the conferences to be over, and then terminate the instance.

Is there any way to do the same in OCI Autoscaling?

My auto-scaled deployments are on AWS. I only use the scale up feature of AWS, I manage scale down using this script.

I’m really hoping I’m wrong, but when I last investigated this I didn’t see a way to sensibly handle scale in/out of JVBs using OCI autoscaler.

Key concerns:

  • Unlike AWS ASGs that support lifecycle hooks, with OCI autoscaler there’s currently no way to postpone termination and handle it gracefully. It is possible to trigger cloud functions on instance termination start events, but at that point termination has already started so it is too late.
  • OCI autoscaler only allows scaling based on CPU utilisation or memory utilisation. There is currently no option to use other performance metrics, let alone custom ones (e.g. JVB stress levels)

It does seem like jitsi-autoscaler or a custom solution that does something similar is the way to go.

Happy to be proven otherwise.

Thank you, @shawn