Autoscaling of Jitsi-Videobridges

We are currently working on hosting Jitsi on Kubernetes.
However, we are now having questions regarding the autoscaling of the JVBs, each of them in an own pod:

  1. When the autoscaler decides to scale down, we might see the case, that there is still a conference running in a pod, which is chosen to be stopped. As far as we understood, this is were the graceful_shutdown script comes into play. With the graceful_shutdown script, the JVB tells Jicofo to not allocate conferences at this JVB anymore and if the conference has ended, it ends the JVB process. Is that correct? And how does the communication with Jicofo work in detail?

  2. Can Jicofo reallocate the conferences of exiting JVBs? This is indicated here: Scale Videobridge inside kubernetes in the answer of Kroev. If so, the conference is interrupted?

  3. How do you deal with scaling down of JVBs at meet.jit.si? Do you run the graceful_shutdown script and wait for the conferences on a JVB to disappear? If that is the case: is your scaling mechanism blocked if there is a JVB that should be dropped, but has still conferences? What happens if another JVB does not have any conferences in the mean time? Does the scaler recognize it and shuts down this one instead? And what happens if the conference takes so long, that the system should upscale again?

Thank you!

1 Like

I have the exact same doubts!

Hi Mvkert, Were you succesful in your implementation? I would be really keen to speak to you about your experience? Thanks

Hi domhacking,

Luckily, we are working in an open source project and we are in the final stages of the project.
Have a look at https://github.com/schul-cloud/jitsi-deployment. We welcome your questions and ideas for improvement.

Mvkert, Thank for your quick response. I will look read through the repository and let you know if I have any questions or feedback. Thanks!

Hello @mvkert Do you have the kubernetes Yamls for scaling Videobridges?

Thank you!

Did you guys solve these questions? I have similar questions

Hi @mvkert,

Thanks for taking the time to answer my previous question. I had a couple of follow up questions:

  • Where do you pull the latest Jitsi code from in your repository?
  • Are we able to customise Jitsi?
  • We want to deploy to google cloud platform. Do we need to set up any limits?

Thanks again for you help.

Best,

Dom

Ok, so I have found where the Jitsi code is pulled in through in your project.

@saghul @damencho @Aaron_K_van_Meerten Is this question are anwered anywhere else? I have same questions regarding JVB scaling down with kubernetes.

@swathikrishna_guru you might want to checkout #10 of this post - [TIP] Fastest Way To Get Support In The Forum

These are all good questions. Keep in mind we aren’t running JVBs as containers, but the same answers apply even for JVBs running on autoscaling VMs as well. I’ll try to give the answers I know, and the rest of the team can (hopefully) clarify if i get anything wrong.

  1. The JVB sets a graceful_shutdown flag in the stats it sends to jicofo. Once jicofo sees this flag set to true, it will no longer schedule new conferences on this bridge. Existing conferences are left there, and even new participants to existing conferences are routed to this bridge. Eventually no conferences are left on the bridge, at which point the JVB will exit and jicofo will see it leave and clean up any references to it at all.

  2. Jicofo reallocates conferences for JVBs when they fail, but we don’t currently do so for the graceful shutdown case. The conference is briefly interrupted while users have their media moved to another bridge. There is definitely room to improve this flow but will take effort in several components.

  3. We are using several autoscaling mechanisms depending on the cloud we’re in. In the AWS case we definitely hit the scaling mechanism blocking when a JVB was in graceful shutdown for long periods. We have developed our own autoscaling service which handles this in a better way, by simply treating any JVB in graceful shutdown as ‘detached’ logically from any future autoscaling decisions. In all clouds no matter which, we have hard limits on how long we’ll wait before forcing a shutdown anyway, but this is a matter of hours so it definitely impacts the AWS version of the autoscaling.

Hope this answers your questions.

2 Likes

@Aaron_K_van_Meerten

Yes , Thank you for the responce

Is this something you can share the scaling method you use in actualy Jitsi environment and the metrices being used for the same?

It is the stress level metric coming from jvb. Which, I believe, at the moment is based on jitsi-videobridge/reference.conf at b24f756cd48f224756bd34f8b09d63fc8f474f93 · jitsi/jitsi-videobridge · GitHub

1 Like

@damencho is correct here, we use the JVB stress level metric, as reported via the colibri stats interface. this is used to make autoscaling decisions. This is tuned based on a threshold which is configurable on the JVB. We have currently tuned our JVBs with 4 CPUs and 8GB/RAM to 81250 (up from the default of 50000).

This value is set in the configuration at:

videobridge {
  load-management {
    load-measurements {
      packet-rate {
        # The packet rate at which we'll consider the bridge overloaded
        load-threshold = 81250
      }
    }
  }
}

Currently we have found it best to scale fast and downscale slowly, so we chose to scale up at 0.3 average stress sustained for 2 mins, and scale down at 0.1 average stress sustained for 10 minutes. However, this is really based on the utilization patterns of your user base, and we are often tuning these values so your mileage may vary.

2 Likes