No one face with that issue?
If so I suspect from our deployment
No one face with that issue?
If so I suspect from our deployment
Do you use websockets for xmpp ? AFAIK it’s recommended for high performance sites but I have no idea if it’s related to your particular problem.
Hi @gpatel-fr
We use web socket for both xmpp and colibri messaging.
Joining a room with 400 people participant, it takes about 10 seconds to process all presence message for new comers.
Do you have mod_limits prosody module enabled ? I think that at some time the bandwidth was drastically lowered by default (11.2 maybe ?) so you could try to set it to higher values to see if it gets better.
Just curious, you used single server for 400 people or used octo.
I am planning to test same but not sure if with paging single server will be ok?
I am planing 16 CPU with 32 GB ram.
@engesin For high-performance Prosody, some things to check:
network_backend = "epoll"
in prosody’s config and it should print Prosody is using the libevent epoll backend for connection handling
on startup.tcp_backlog
higher than default (511
is a good value)memory
for everything that doesn’t need to be persistent (generally everything except accounts & roster)mod_limits
is set high enough, and consider adding focus, jvb, etc to unlimited_jids
if not already therehave you confirmed that prosody is the bottleneck? jicofo also can be a factor when you encounter slow performance joining large rooms.
@kkd if that’s a physical server, and it’s a 16-core Xeon or Epyc or similar, and we’re just talking about JVB, it should handle 400 participants if the network interface & upstream bandwidth is up to it. If it’s a VM, or you mean 16 threads rather than cores, or you’re putting other components on the same server, YMMV. You may consider running multiple JVBs per server; on large servers, JVB tends to hit software limitations before you manage to fully utilise the hardware.
hi @jbg,
Thank you for your detailed explanation. That was very helpfull,
We fulfil these requirements at backend side.
When i investigate from cilent side, I see that presences are coming very fast at websocket message section of developer tool…
When i profile client performance, processing that much presence takes time…
I want to ask of your experience. Does your client takes time to join a crowded meetings at the beginning?
For example, when I compare the time it took to attend a meeting with 50 participants and 400 participants, I see a serious difference.
We are still using stable_5390 and waiting for new stable version to cover pagination features…
There were many improvements done since January. You better test with latest from unstable or wait for the new stable release, I believe it will be at some point next week.
@damencho do you know the calculation of tcp_backlog
value based on how much resources we have and how many concurrent users we want
e.g Nginx + Prosody + Jicofo has 16GB 4 core CPU & want 2000-3000 (if need we can increase 32GB and 8 core but it’s good we can achieve 2000-3000 concurrent users ) concurrent users then what’s the ideal value for tcp_backlog
does this also increase the performance of prosody?
we should put
gc = {
mode = "incremental";
threshold = 105;
speed = 250;
}
in prosody.cfg.lua right?
what are your suggested numbers on threshold and speed?
if we will use Lua 5.4 then how to configure
mode = "generational"
I understand tuning Nginx is also important but currently, I’m more interested in prosody
thanks
tcp_backlog = 511
is quite fine even for large scale; it’s a connect queue and doesn’t have any impact once the connection is established. Connections to Prosody are generally long-running.Is there any magic in this number?
Why not 500
or 512
?
On some systems, SOMAXCONN
(which the TCP backlog corresponds to) is an 8-bit unsigned int. 512 truncated into a u8 is 0, whereas 511 truncated into a u8 is 255. So 511 is a safer recommendation to make when you don’t know details about the user’s system.
With latest versions of prosody and jitsi-meet components on a m5.xlarge we are seeing at moments close to 6000 participants. But all depends on the distribution of the participants per conference. With all recommendations from @jbg.
Whether the instance cope with the traffic we monitor the CPU usage of the prosody process.
Re-reading @pratik’s post, I now see only nginx + prosody + jicofo are on the 4-core/16GB server (i.e. JVBs are separate). Yes, that should be fine with proper tuning.
Hi to all,
After upgrading stable 6173, we manage to create a conference with total 321 participant and 29 of them have video enabled.
We can not increase number due to cpu usage of torture server reached to its limits
Jicofo divides conference to 3 jvb (OCTO) and due to pagination, both server and client bandwith allocation is seems to be very efficient with respect to older releases (which was 5963 for us).
Time consuming to join crowded meeting is not annoying so much with respect to older releases.
Performance improvements of client and prosody module ( mod_jiconop.lua) probably take role for that experience improvement.
thank you for all.
Hi again,
My latest load test is as follows,
I create 81 room with 40 participant (total 3240) and 4 of those have video.
I user one physical server with 6 instance JVB running on that. CPU of the server is about %50
Stress levels of the JVB instances are about 1.5 but no abnormality is observed.
I wonder what is my risk if any of the JVM stress goes beyond 1.0?
This is very impressive. Means a JVB is capable of handling more than 500 participants.
Thats true for this load test @Freddie
But although i observed any problem about conference healths, Stress level worries me
I vaguely remember something about the stress level being tied to 70% or something like that. It’s in the code. If I find the information, I’ll share.
The understanding I have (which could be inaccurate) is that the stress level is relative to whatever value you set as the packet-rate load-threshold, so the stress level is only as accurate as the value you have set there. In other words, you kinda need to tweak it to match the expected capacity of your host for it to be useful.
I don’t know the full implication of a high stress level, but from what I’ve gathered so far:
I admit this is all quite waffly. Hopefully someone who knows better will chip in
Edit – Here’s a very insightful post with more details: