Loadbalancing between JVBs with weighting coefficients

Dear all,
It seems to me that the JVBs shares the conferences with the round robin algorithm. Can we devide the work between JVBs in accordance with some coefficients which are proportional to the performance of each JVB (like number of CPU core, RAM…)
Thank you

There are already such settings. Do we have them documented somewhere @Boris_Grozev ?

It is a great news hearing that such algorithm is already implemented. @Boris_Grozev would you please give me some hints on how to configure the JVBs
Many thanks

You can play with these settings.

Dear Damencho,
Many thanks for the precious information.
Where can I get some more information on these values? For instance the min-max values, the meaning of stress-threshold (CPU utilisation ?) and load-threshold (load-threshold x Max UDP packet-size to give the BW where the JVB is considered as saturated ? )
Thank you again for your help

Those values depends on many factors, like the machine size, the number of participants and the number of conferences and the distribution of people in those conferences.
You need to start with some values and monitor and adjust …

Hey mstran,

We have the bridge communicate to jicofo two values:

  1. stress_level: indicates the current load on the machine, with 1.0 corresponding to the maximum load it can handle without affecting call quality.
  2. average_participant_stress: a pre-configured value for the estimated stress that one endpoint contributes (i.e. the 1/max-participants). This is just an estimation.

Jicofo uses those two values to calculate the stress of a bridge as follows:
stress = stress_level + average_participant_stress * number_of_recently_added_endpoints
The correction for recently added endpoints is necessary because jicofo’s view of the stress reported by a bridge is not always up to date (it is updated every few seconds).

The stress-threshold comes into play with Octo (multiple bridges for one conference). When jicofo selects a bridge for a new conference is just uses the bridge with the lowest stress. When it selects a bridge for a new participant in an existing conference, it favors the bridge(s) already in the conference. If one of them has stress < stress-threshold, it is selected (the one with the lowest stress between them). Otherwise, it adds uses a new bridge (if one with stress < stress-threshold is available).

The aim is to keep the average stress across bridgs below stress-thershold (by having a sufficient number of bridges), but also handle the case of one of them spiking, for example when a conference on it grows to a large number of participants. You can tweak the stress-threshold in jicofo’s config, but the default value of 0.8 should be reasonable.

We observe that the load on the bridge correlates well with packet rate (and less so with bitrate), so we use packet rate as a proxy for stress. The packet-rate.load-threshold setting in jvb.conf configures the maximum packet rate the machine can handles, i.e. what corresponds to stress=1. We determine this for different machines with load tests. For EC2 c5.xlarge instances we use 50000.

The max number of endpoints is much harder to estimate as it changes with average conference size (many p2p calls?), user behavior (video enabled?), user network conditions, etc. But it’s also not as important to set average-participant-stress accurately, and the default value of 0.01 should work fine in most cases.

Boris

Edit: correction, we use 50000 for c5.xlarge, not c5.large

4 Likes

Dear @Boris_Grozev @damencho
Thank you very much for your detailed information. I need sometimes to read and well understand them all.
However I would like to ask you with which load test you can determine say packet rate 50k is optimal for C5Xlarge configuration ?
My problem is that whenever we access a room for a while, everything seems to be OK. Only when more people get in. stay for longer period, and the problem of AV appears.
Another question is that among my JVBs there is one that hosts a lot of rooms (211 rooms) as in the attachment. It is due to my wrong setting somewhere?
I saw you have Jitsi as a service right now. IS it a more optimal version of open source jitsi ?
Best wishesJVB

Jaas is an option where we host the service for you, to use our infrastructure, it runs with all the opensource components it just provides some hooks and settings so we can accommodate clients needs.

That’s hard to say. Are all CMC bridges connected to the same jicofo? Do they have the same load-threshold setting? Can you look at their stress_level stat? Do they have statistics enabled?

Boris

Dear @Boris_Grozev ,
The JVBs connected to the same jicofo. They were set with the default setting for JVBs so I don,t think I changed loading threshold. Somehow I know what do you meant by stress-level, but how to check its stat ?
My JVBs are statistic enable I enclosed here the snapshot of one of our JVB, which hosted a lot of conference. Which values should I focus on to have some hints on the QoS on that date? The package rate is still less than 50k (of course my server is not as powerful as X5LArge).
Many thanks in advance for your hleps

Along with the packet rate there is a stat named “stress_level”.

The best indication of QoS we’ve found is the packet transit time. Unfortunately, these stats are exposed separately and you need some extra work to bring them to the dashboard. They are accessible under /debug/stats/jvb/transit-stats on the REST interface.

Boris

Dear @Boris_Grozev ,
Please guide me to activate the REST interface for transit-stats. I just test with my stats-enabled JVBs (according to jitsi-videobridge/statistics.md at master · jitsi/jitsi-videobridge · GitHub), the path you gave above is Not found.
many thanks

They are always enabled. If you are running an older version of the bridge they might be under /debug/stats/transit-stats

Boris

Dear Boris,
My JVB version is 2.1-202-g5f9377b9-1.
I try REST interface with both URLs in vain:

sontm@jvb-cmc1:~$ curl http://localhost:8080/debug/stats/transit-stats

Error 404 Not Found

HTTP ERROR 404

Problem accessing /debug/stats/transit-stats. Reason:

    Not Found


Powered by Jetty:// 9.4.15.v20190215
sontm@jvb-cmc1:~$ curl http://localhost:8080/debug/stats/jvb/transit-stats
Error 404 Not Found

HTTP ERROR 404

Problem accessing /debug/stats/jvb/transit-stats. Reason:

    Not Found


Powered by Jetty:// 9.4.15.v20190215

Please give me some hint to check the right address
Best wishes

Dear @Boris_Grozev ,
Just by chance I found the correct address for my version JVB:
curl http://localhost:8080/colibri/debug/stats/transit-stats.
Do you have some doc explaining these params:
curl http://localhost:8080/colibri/debug/stats/transit-stats

{“e2e_packet_delay”:{“rtp”:{“average_delay_ms”:1.6629079998570075,“max_delay_ms”:312,“total_count”:12252394,“buckets”:{"<= 2 ms":10619015,"<= 5 ms":894801,"<= 20 ms":621797,"<= 50 ms":95880,"<= 200 ms":20670,"<= 500 ms":231,"<= 1000 ms":0,"> 1000 ms":0,“p99<=”:20,“p999<=”:200}},“rtcp”:{“average_delay_ms”:1.759233174677702,“max_delay_ms”:241,“total_count”:325863,“buckets”:{"<= 2 ms":279243,"<= 5 ms":24832,"<= 20 ms":18270,"<= 50 ms":2901,"<= 200 ms":614,"<= 500 ms":3,"<= 1000 ms":0,"> 1000 ms":0,“p99<=”:50,“p999<=”:200}}},“overall_bridge_jitter”:null}

Many thanks

You may want to update, the latest version is 452, so you’re 250 commits behind.

The transit-stats count the number of packets in buckets by the amount of time they spent in the bridge. For example in your case 621797 packets out of 12252394 (5%) were delayed between 5 and 20 ms. Overall the delay seems high, and that’s the average – my guess is at the peaks the machine was overloaded.

Boris

Dear @Boris_Grozev ,
I try to understand the threshold for the delay. Obviously the lesser it is, the better the QoS can be obtained. Say for a video of 25fps, then we have 40ms for each frame. If most delay of a packet (leading to the delay of a full frame) is less than 40ms then probably visual degradation is still not visible ? So the frame rate could be a threshold for evaluation?
Best wishes

Frame rate is a good indication if you can measure it, but it will only start to drop when the problems are severe. Prior to that, the receivers will increase the size of their jitter buffers and handle the delayed packets without a drop in frame rate (but with higher latency).

Boris