Asking for Developer since System doesn't work as expected

Hello all

I would have a very specific question.
Would it be possible to get one of the developers on a call with our developers and testers?
I’ll briefly describe what the problem is:

For some reason unexplained to us, only 300-350 participants per system can still participate in the video conferences.
The hardware setting seems to be correct after checking it several times.
Hardware technically we have 32 CPU server as well as 8 GB RAM.
There is a stress test of 600 participants within 15 minutes. This has worked before, but since the update to stable-7648 it is no longer possible.
2 JVB are set.
From the participant number of 300-350 the system of Jitsi acts like overloaded, shows bad qualities, kicks out existing participants and new participants are timed out etc.

So that we don’t spend forever writing to each other, it would be far easier to show someone of you the problem and ask for help.

It would help us a lot and I look forward to feedback!

Kind regards
Raphael

You need to figure out the source of the error. Is the machine overloaded (seems like), but which process?
Are jvbs on the same machine? Is it possible the link that hosts the deployment to have problems.
Bad quality seems like the link to the jvbs is bad. Kicked out and timingout participants when joining this is signalling - nginx or prosody. So seems both network paths are having a problem. As a start what errors do you see in the js console logs when this happens.

Sorry, the Jitsi team does not offer such kind of support. You can try the paid section of the forum, there are people/companies doing that or you can use the public part of the forum/here to get help from the community, so everyone can benefit from that.

Hello @damencho, thank you for your reply.

In our setup, the two JVBs are hosted on the same machine, we have been able to previously test over 600 Users in a 15 Minutes Ramp up phase.

Our test has been done using the following configuration:

8vCPU, 32GB RAM

400 Users

I am able to provide you some of the generated logs, but, as a new user, I am not able to upload them.

As @damencho said, you should find the overloaded process/resource. Do you monitor the server during the load test?

Another common problem with load testing is to overload the selenium nodes which results in very poor upload from the nodes which produces different results. Normally we monitor the stats around that to make sure that the nodes are sending 720p bitrate. And that test scenario with overloaded nodes is a different one than the same amount of real users that are able to send proper 720p.

We have monitored the entire server and did not find anything wrong.

We are testing using AWS VMs , each user is a VM

And those have enough resources to be sending 720p - 4cores 8GB of RAM? And the chromes are configured to send 720p?

When you see the problem what is the download and upload to the server?
What is the CPU usage of the prosody process?
Over all CPU usage on the machine may be misleading. And if you run everything on the same machine you better monitor the 2 jvbs and prosody CPU separately.

And if this is your setup that can really have memory usage problems.
Jicofo and jvb have a max usage of 3GB of RAM, which means just those can go up to 9, and with a large number of users you need memory for nginx (not so much) but also 3-4GB or more for prosody … So maybe this is your problem. If all of them struggle for memory you will see timeouts through prosody and people kicked out. If jvbs are struggling for memory, high GC times (you can also monitor that with the packet rate and packet delay times using the rest endpoints of jvbs ) and high GC times mean low quality for participants.
Do you see in your stats the used memory during the problem?

The VMs have enough resources, the instances are configured to meet such minimal requirements. Since this issue started to happen we have seen a decrease in CPU Usage (related to an Jitsi Update deployed). Each JVB is monitored and we did not see any direct issue.

We are not able to see any performance issue related to the video quality or such.

I will try to run a new test on Monday and increase the RAM for the Server, I hope this is the issue, as it will be an easy fix.

Despite this, we can’t understand how it has previously worked with the same setup, the only thing that was limiting us in reaching 1000 users, with video, was the CPU.