Large Number of Participants & Prosody Performance

@damencho do you know the calculation of tcp_backlog value based on how much resources we have and how many concurrent users we want
e.g Nginx + Prosody + Jicofo has 16GB 4 core CPU & want 2000-3000 (if need we can increase 32GB and 8 core but it’s good we can achieve 2000-3000 concurrent users ) concurrent users then what’s the ideal value for tcp_backlog

does this also increase the performance of prosody?

we should put

gc = {
    mode = "incremental";
    threshold = 105;
    speed = 250;
}

in prosody.cfg.lua right?
what are your suggested numbers on threshold and speed?
if we will use Lua 5.4 then how to configure

mode = "generational"

I understand tuning Nginx is also important but currently, I’m more interested in prosody

thanks

  • A single 16GB/4-core server won’t get you anywhere near 2000 concurrent users. You’ll need several, larger, servers.
  • tcp_backlog = 511 is quite fine even for large scale; it’s a connect queue and doesn’t have any impact once the connection is established. Connections to Prosody are generally long-running.
  • Tuning Lua’s GC should be seen as a micro-optimisation which, if done at all, should be done based on profiling data from your actual deployment. The settings that work best for one deployment may not be the best for another deployment.
  • At larger scale you should probably consider ditching nginx; it isn’t really a needed component except to make deployment easier on small setups. Even if you do keep it for serving the frontend files, there’s no reason everything else needs to be funneled through it.

Is there any magic in this number?
Why not 500 or 512?

On some systems, SOMAXCONN (which the TCP backlog corresponds to) is an 8-bit unsigned int. 512 truncated into a u8 is 0, whereas 511 truncated into a u8 is 255. So 511 is a safer recommendation to make when you don’t know details about the user’s system.

2 Likes

With latest versions of prosody and jitsi-meet components on a m5.xlarge we are seeing at moments close to 6000 participants. But all depends on the distribution of the participants per conference. With all recommendations from @jbg.
Whether the instance cope with the traffic we monitor the CPU usage of the prosody process.

2 Likes

Re-reading @pratik’s post, I now see only nginx + prosody + jicofo are on the 4-core/16GB server (i.e. JVBs are separate). Yes, that should be fine with proper tuning.

1 Like

Hi to all,

After upgrading stable 6173, we manage to create a conference with total 321 participant and 29 of them have video enabled.

We can not increase number due to cpu usage of torture server reached to its limits :slight_smile:

Jicofo divides conference to 3 jvb (OCTO) and due to pagination, both server and client bandwith allocation is seems to be very efficient with respect to older releases (which was 5963 for us).

Time consuming to join crowded meeting is not annoying so much with respect to older releases.
Performance improvements of client and prosody module ( mod_jiconop.lua) probably take role for that experience improvement.

thank you for all.

2 Likes

Hi again,

My latest load test is as follows,

I create 81 room with 40 participant (total 3240) and 4 of those have video.

I user one physical server with 6 instance JVB running on that. CPU of the server is about %50

Stress levels of the JVB instances are about 1.5 but no abnormality is observed.

I wonder what is my risk if any of the JVM stress goes beyond 1.0?

1 Like

This is very impressive. Means a JVB is capable of handling more than 500 participants. :+1:t5:

Thats true for this load test @Freddie

But although i observed any problem about conference healths, Stress level worries me :confused:

I vaguely remember something about the stress level being tied to 70% or something like that. It’s in the code. If I find the information, I’ll share.

The understanding I have (which could be inaccurate) is that the stress level is relative to whatever value you set as the packet-rate load-threshold, so the stress level is only as accurate as the value you have set there. In other words, you kinda need to tweak it to match the expected capacity of your host for it to be useful.

I don’t know the full implication of a high stress level, but from what I’ve gathered so far:

  • Jicofo would consider a bridge “under stress” if it reports a stress level of >0.8 (configurable). Not sure what this means in practice if octo not in use, but will definitely come into play with Octo (see link below)
  • JVB has a load-reducer feature (disabled by default) where it starts to lower the last-n value once a stress threshold is breached.

I admit this is all quite waffly. Hopefully someone who knows better will chip in :slight_smile:

Edit – Here’s a very insightful post with more details:

2 Likes

Yes, that would be the same thread. Here are the code references:

1 Like

Yep these are calculation parameters of stress level and how jicofo makes decisions by using stress levels

As a result, I understand that , although I did not face with any problem, if stress level above 1.0, call quality is not guaranteed…

But if you haven’t touched load-threshold = 50000 this means that currently that stress level is based on that number, but it can be that your machine that hosts that bridge is capable of doing more. So basically you need to figure out those numbers based on your VM.
A good metric to look at is the packet delay stat and know the bridge people use that to say is a bridge overloaded or not :slight_smile: Bu I don’t have values stored in the back of my head to give you about it :slight_smile:

2 Likes

As far as I analyze, when i do load traffic to 1 jvb instance, I saw package loss when stress level is at 4, and package rate is nearly 200000 (upload+download).

so these values are accurate for me, my environment has nearly 4 times better performance

than c5.xlarge.

load-threshold = 200000 is true value for my environment :slight_smile:

With bare metal 8C/16T Xeon servers with 32 GiB RAM we use load-threshold = 250000. With that load threshold, a stress level of 1.0 is about where RTP/RTCP packet delay starts to ramp up which is when you start to see an impact on quality.

Tuning the load threshold so that stress < 1.0 in normal operation is important for Jicofo’s bridge selection logic as well as being important if you use the load reducers (which IMO should be a last resort protection if something has really gone wrong with your scaling).

For big room, I see that there is a new configuration at jicofo → source-signaling-delays

Does it enabled at meet.jit.si ? Do you suggest to activate that config?

This is something we are currently working on … that has still not reached meet.jit.si

Hi,

is following prosody config set at jitsi production environment?

Thank you.