Large Number of Participants & Prosody Performance

Hi community,

When we test large participants with pagination feature, we see that there is a high traffic at the joining stage of the new comer as expected…

When new participant joins, it receives lots of presence and other messages and client needs some time to be fully operational.

Do you suggest any optimization or do you realize this?

Thank you

What version of Prosody are you running?

Hi,

We are using version of 0.11.5 with patches supplied by jitsi-meet repo.

No one face with that issue?

If so I suspect from our deployment :slight_smile:

Do you use websockets for xmpp ? AFAIK it’s recommended for high performance sites but I have no idea if it’s related to your particular problem.

Hi @gpatel-fr

We use web socket for both xmpp and colibri messaging.

Joining a room with 400 people participant, it takes about 10 seconds to process all presence message for new comers.

Do you have mod_limits prosody module enabled ? I think that at some time the bandwidth was drastically lowered by default (11.2 maybe ?) so you could try to set it to higher values to see if it gets better.

Just curious, you used single server for 400 people or used octo.
I am planning to test same but not sure if with paging single server will be ok?
I am planing 16 CPU with 32 GB ram.

@engesin For high-performance Prosody, some things to check:

  • if on Linux, ensure Prosody is using libevent+epoll. on 0.11 this means network_backend = "epoll" in prosody’s config and it should print Prosody is using the libevent epoll backend for connection handling on startup.
  • set tcp_backlog higher than default (511 is a good value)
  • make sure you’re using WebSockets, and make sure whatever is in front (nginx, cloud load balancer, etc) is tuned appropriately.
  • set prosody storage settings to memory for everything that doesn’t need to be persistent (generally everything except accounts & roster)
  • as mentioned by @gpatel-fr, make sure mod_limits is set high enough, and consider adding focus, jvb, etc to unlimited_jids if not already there
  • make sure Prosody is getting enough resources; for very large deployments it’s better to deploy and manage each component separately (e.g. using Kubernetes) and set/monitor the resource requirements of each component, rather than the default approach of shoving everything on a single server and hoping for the best.

have you confirmed that prosody is the bottleneck? jicofo also can be a factor when you encounter slow performance joining large rooms.

@kkd if that’s a physical server, and it’s a 16-core Xeon or Epyc or similar, and we’re just talking about JVB, it should handle 400 participants if the network interface & upstream bandwidth is up to it. If it’s a VM, or you mean 16 threads rather than cores, or you’re putting other components on the same server, YMMV. You may consider running multiple JVBs per server; on large servers, JVB tends to hit software limitations before you manage to fully utilise the hardware.

3 Likes

hi @jbg,

Thank you for your detailed explanation. That was very helpfull,

We fulfil these requirements at backend side.

When i investigate from cilent side, I see that presences are coming very fast at websocket message section of developer tool…

When i profile client performance, processing that much presence takes time…

I want to ask of your experience. Does your client takes time to join a crowded meetings at the beginning?

For example, when I compare the time it took to attend a meeting with 50 participants and 400 participants, I see a serious difference.

We are still using stable_5390 and waiting for new stable version to cover pagination features…

There were many improvements done since January. You better test with latest from unstable or wait for the new stable release, I believe it will be at some point next week.

1 Like

@damencho do you know the calculation of tcp_backlog value based on how much resources we have and how many concurrent users we want
e.g Nginx + Prosody + Jicofo has 16GB 4 core CPU & want 2000-3000 (if need we can increase 32GB and 8 core but it’s good we can achieve 2000-3000 concurrent users ) concurrent users then what’s the ideal value for tcp_backlog

does this also increase the performance of prosody?

we should put

gc = {
    mode = "incremental";
    threshold = 105;
    speed = 250;
}

in prosody.cfg.lua right?
what are your suggested numbers on threshold and speed?
if we will use Lua 5.4 then how to configure

mode = "generational"

I understand tuning Nginx is also important but currently, I’m more interested in prosody

thanks

  • A single 16GB/4-core server won’t get you anywhere near 2000 concurrent users. You’ll need several, larger, servers.
  • tcp_backlog = 511 is quite fine even for large scale; it’s a connect queue and doesn’t have any impact once the connection is established. Connections to Prosody are generally long-running.
  • Tuning Lua’s GC should be seen as a micro-optimisation which, if done at all, should be done based on profiling data from your actual deployment. The settings that work best for one deployment may not be the best for another deployment.
  • At larger scale you should probably consider ditching nginx; it isn’t really a needed component except to make deployment easier on small setups. Even if you do keep it for serving the frontend files, there’s no reason everything else needs to be funneled through it.

Is there any magic in this number?
Why not 500 or 512?

On some systems, SOMAXCONN (which the TCP backlog corresponds to) is an 8-bit unsigned int. 512 truncated into a u8 is 0, whereas 511 truncated into a u8 is 255. So 511 is a safer recommendation to make when you don’t know details about the user’s system.

2 Likes

With latest versions of prosody and jitsi-meet components on a m5.xlarge we are seeing at moments close to 6000 participants. But all depends on the distribution of the participants per conference. With all recommendations from @jbg.
Whether the instance cope with the traffic we monitor the CPU usage of the prosody process.

2 Likes

Re-reading @pratik’s post, I now see only nginx + prosody + jicofo are on the 4-core/16GB server (i.e. JVBs are separate). Yes, that should be fine with proper tuning.

1 Like

Hi to all,

After upgrading stable 6173, we manage to create a conference with total 321 participant and 29 of them have video enabled.

We can not increase number due to cpu usage of torture server reached to its limits :slight_smile:

Jicofo divides conference to 3 jvb (OCTO) and due to pagination, both server and client bandwith allocation is seems to be very efficient with respect to older releases (which was 5963 for us).

Time consuming to join crowded meeting is not annoying so much with respect to older releases.
Performance improvements of client and prosody module ( mod_jiconop.lua) probably take role for that experience improvement.

thank you for all.

2 Likes

Hi again,

My latest load test is as follows,

I create 81 room with 40 participant (total 3240) and 4 of those have video.

I user one physical server with 6 instance JVB running on that. CPU of the server is about %50

Stress levels of the JVB instances are about 1.5 but no abnormality is observed.

I wonder what is my risk if any of the JVM stress goes beyond 1.0?

1 Like

This is very impressive. Means a JVB is capable of handling more than 500 participants. :+1:t5: