Site slowing down on concurrent connections

We have setup Torture on servers with overall 320 cores and almost 1000 GB memory, also have increased the total number of threads which is used by operating system (default was 7072, we set 100000).
No exceptions occurs in Torture instances with the setup.
The problem is that it takes 10 minutes to bring 300 participants(~ 100 conferences, 3 participants per conference).


This chart is constructed by JVB data (colibri/stats endpoint)
As it shows, it has taken 10 minutes to reach 300 participants in 100 conferences.

Also, in production, It has been seen that in the case of participants more than almost 150, with 6 JVB instance, prosody can’t handle videobridge health check responses and does not deliver the responses to Jicofo, so, suddenly, Jicofo assumes that all videobridges are gone (crashed) and cancels all of the conferences and all of the participants are kicked out together.


As the chart shows, at 09:31:00 suddenly all of the conferences have been failed. We checked the logs and all of the videobridges were available and there was no disconnection error for videobridges in prosody logs at the time, but Jicofo has logged that all videobridges are failed and are not connected anymore, so it has kickedout everyone and has terminated all of the conferences.
We see this behaviour once a day . We switched to JVB websocket and Prosody websocket and increased the kernel threads limit and … but the problem still occurs.
We appreciate anyhelp.
thanks

Are you running prosody 0.11.7? When the problem occurs do you see jicofo able to join a room in prosody.

* util.indexedbheap: Fix heap data structure corruption, causing some timers to fail after a reschedule (fixes [#1572](https://issues.prosody.im/1572))

There was a bug in prosody where certain packets may got delayed which will result timeouts in jicofo and it can think bridges gone away.

1 Like

Thanks for the reply.
No we are using dockerized version of Prosody which is available in docker-jitsi-meet project. docker-jitsi-meet installs prosody 0.11.5.
I checked the Prosody log and there was no such an error. (log level is set to info, not debug)
I must upgrade prosody?

There is no error that occurs in prosody when this occurs. You will notice it in timeouts in jicofo logs and jicofo not able to join rooms due to timeouts and if healthchecks are running they will be failing.
As you describe this problem, this is what it comes to my mind, that you are hitting this.

1 Like

Thank you. I upgrade prosody and see what happens.
But for now, there is no Prosody 0.11.7 in


How can I find debian package of Prosody 0.11.7? I just found nightly builds at

thanks

Here: https://prosody.im/download/package_repository

I followed the instructions but it installs
http://packages.prosody.im/debian stretch/main amd64 prosody amd64 0.11.5-1~stretch6
(installs version 0.11.5 not 0.11.7)

Yeah, I see there is no 0.11.6 and 0.11.7 for stretch … Not sure what you can do in this case…
We will be working on updating docker to newer Debian, but it involves and other changes that need to be done first and it will take time.

Alright,thanks for the reply.
For now, I think I have two solutions:
1- Use nightly builds.
2- Change Dockerfile and use Ubuntu (groovy, bionic, …) instead of Debian Stretch, but it is a little bit difficult (I have tested it before) since some Lua packages are incompatible with some Ubuntu versions and some packages (packages used in Dockerfile of docker-jitsi-meet for Prosody) are not available in some Ubuntu versions.
So I prefer to use nightly builds. Do you know which version of nightly builds has fixed the problem?
I think I have to use of one the latest files of

Is there any other solution to fix the problem temporarily?

Well you can apply the fix by hand or with a diff … it is very simple. https://hg.prosody.im/trunk/rev/7d4c292f178e
Skip the tests change and do only the one in util/indexedbheap.lua

1 Like

Very nice, thank you.
I can patch the file with diff. This is much more easier than other solutions I suggested.

Have you guys scaled prosody for meet.jit.si as well as the JVBs? It seems like the initial connection to meeting speed on your platform in consistent independent from the number of users on the platform.

@damencho
Do you have any idea? Thank you.

If you are asking about meet.ji.tis. Yes, of course, it is scaled, we have multiple shards in each of the 7 regions we run meet.jit.si with hundreds of bridges to serve it. There are thousands of concurrent sessions and a single shard cannot meet that load.

2 Likes

We dug deeper into the issue and we started generating some MUC load test on our prosody. We added around 750 users in 50 conferences (15 participants per conference) to our prosody using chat clients (xmppjs library) and started sending messages by the bots.

The very odd thing here is although the overall CPU usage of our 32 core server was low but one of the cores was busy (80% to 100% usage) all the time and since the bot where just connecting to Prosody, and sending chat messages, neither Jicofo nor JVB where involved in the test.
So we could conclude that our issue comes from the prosody getting very busy but only using 1 core to handle the load and it is obvious that it will cap it maximum of that core very soon. Since prosody is the heart of message passing in our system, users joining the room and there overall experience gets affected a lot. When the loadtest is not running, we could join our rooms in less than 5 seconds, but when we run the tests, it takes us more than 2 minutes to join the same room.

Is there a way to force Prosody to use more than one core?




P.S1: We are using docker and docker-compose to spin up our stack.
P.S2: Here is the code we used to generate this load.

Prosody by its nature is single threaded

1 Like

Can you try same test with this version of prosody-0.11_1nightly111-1
You can take it from here https://packages.prosody.im/debian/pool/main/p/prosody-0.11/
So you see any difference?

1 Like

We use Prosody docker image provided in docker-jitsi-meet (debian:stretch-slim), should we have use this debian package? That’s the last debian stretch package, the others are for ubuntu.

Yeah not sure how to test it there, don’t have much experience with docker.
But the numbers seem pretty low though, what is the maximum RAM and CPU that prosody can get?
We serve a lot more users on one prosody instance on meet.jit.si with jicofo connecting to it.

Installed this version and verified installation by:

prosodyctl --config /config/prosody.cfg.lua about

output:

Summary

Prosody 0.11 nightly build 111 (2020-10-15, 37dc2a6144d1)

Prosody directories

Data directory: /config/data
Config directory: /config
Source directory: /usr/lib/prosody
Plugin directories:
/prosody-plugins/
/prosody-plugins-custom/
/usr/lib/prosody/modules/

Lua environment

Lua version: Lua 5.2

Lua module search paths:
/usr/lib/prosody/?.lua
/usr/local/share/lua/5.2/?.lua
/usr/local/share/lua/5.2/?/init.lua
/usr/local/lib/lua/5.2/?.lua
/usr/local/lib/lua/5.2/?/init.lua
/usr/share/lua/5.2/?.lua
/usr/share/lua/5.2/?/init.lua
/root/.luarocks/share/lua/5.2/?.lua
/root/.luarocks/share/lua/5.2/?/init.lua

Lua C module search paths:
/usr/lib/prosody/?.so
/usr/local/lib/lua/5.2/?.so
/usr/lib/x86_64-linux-gnu/lua/5.2/?.so
/usr/lib/lua/5.2/?.so
/usr/local/lib/lua/5.2/loadall.so
/root/.luarocks/lib/lua/5.2/?.so

LuaRocks: Installed (2.4.2)

Network

Backend: epoll

Lua module versions

lfs: LuaFileSystem 1.6.3
lxp: LuaExpat 1.3.0
socket: LuaSocket 3.0-rc1
ssl: 0.7

Same problem still remains. Also this version requires libssl1.0.0 (>= 1.0.0.) which can be downloaded from here and be installed.

htop results:

Summary


Always one of the cores are fully used (100% usage).