Jitsi load question (from 60 users to more that 500..)

Hello,

Actually, I have absolutely no idea of my jitsi’s load… I am currently running a VM 8core with 8GB ram on a KVM hypersvisor.
I’ve got regulary at least 10 rooms with ~60/70 users dispatched on theses rooms. Theoretically, I could have until 500 users simultaneously (my organisation have +15000 users… everybody have an account), but it never happened.

Sometime, my users are randomly disconnected or loose video or micro. Don’t know why, now way to track why theses problems, never seen myself.

My first question is : how can I know how the jitsi server is loaded ??

Actually, when I’ve got 60 users, I use rarely more than 1 core (so 100%/120% max of 800% CPU load), my bandwith/ram is also not a problem. So it seem to be not an hardware limitation.

How can I been sure Jitsi is using full capacity of my server ? What are error messages indicates we are overloaded ?

I use the basic configuration, all in one server, with theses packages (debian buster) :

jicofo 1.0-644-1
jitsi-meet 2.0.5142-1
jitsi-meet-prosody 1.0.4466-1
jitsi-meet-turnserver 1.0.4466-1
jitsi-meet-web 1.0.4466-1
jitsi-meet-web-config 1.0.4466-1
jitsi-videobridge2 2.1-376-g9f12bfe2-1

If it is a http disconnects check nginx logs, do you see some 504 there?
Make sure you have these on your system: https://github.com/jitsi/jitsi-videobridge/blob/master/config/20-jvb-udp-buffers.conf
Or you can see UDP packets discarded from the kernel which will lead to bad quality and even turning off video for people …

Thanks for your answer.

Got some 504 error on some post requests like this :

“POST /http-bind?room=cac8906b28d0a77e1eb28a1c10444084 HTTP/2.0” 504

tunning UDP is exactly like yours.

No UDP dropped by my firewall, but, in case of, here is a part of my rules (nftables) :

table inet filter {
        set f2b-false-services {
                type ipv4_addr
                elements = { ip1,ip2,ip3
                           }
        }

        chain input {
                type filter hook input priority 0; policy accept;
                tcp dport { http, https } ip saddr @f2b-false-services counter packets 0 bytes 0 log prefix "nft/input [f2b DROP]" group 2 drop
                ct state { established, related } accept
                tcp flags & (fin | syn) == fin | syn log prefix "DROP : Port scan possible (1)" drop
                tcp flags & (syn | rst) == syn | rst log prefix "DROP : Port scan possible (2)" drop
                tcp flags & (fin | syn | rst | psh | ack | urg) < fin log prefix "DROP : Port scan possible (3)" drop
                tcp flags & (fin | syn | rst | psh | ack | urg) == fin | psh | urg log prefix " DROP : Port scan possible (4)" drop
                iifname "lo" accept
                ip protocol icmp accept
                ip6 nexthdr ipv6-icmp accept
                tcp dport { ftp, 2278, munin, 40000-40010 } ip saddr { adminip1, adminip2 } accept
                udp dport { ftp, https, 4443, 4669, xmpp-client, xmpp-server, 10000-20000 } ip saddr { adminip1, adminip2 } accept
                tcp dport { http, https, 4443, 4669, xmpp-client, xmpp-server } accept
                udp dport { https, 4443, 4669, 10000-20000 } accept
                counter packets 1621 bytes 99522 drop
        }

        chain forward {
                type filter hook forward priority 0; policy accept;
                drop
        }

        chain output {
                type filter hook output priority 0; policy accept;
                accept
        }
}
table inet fail2ban {
        chain INPUT {
                type filter hook input priority 100; policy accept;
        }
}

No, it is not the firewall, it is the kernel. There is a buffer and if nobody reads, normally because of load, those packets are dropped to make space for new ones.

This means the connection to the prosody timeouts. Are you using prosody 0.11? What is the exact version? Make sure you monitor the prosody process cpu usage. Prosody is single-threaded and if you are hitting 100% on one core for that you may be getting those 504.

Thanks for your precious informations :

Prosody 0.11.2

# Prosody directories
Data directory:     /var/lib/prosody
Config directory:   /etc/prosody
Source directory:   /usr/lib/prosody
Plugin directories:
  /usr/local/lib/prosody/modules - not a directory!
  /usr/lib/prosody/modules/


# Lua environment
Lua version:                    Lua 5.2

Lua module search paths:
  /usr/lib/prosody/?.lua
  /usr/local/share/lua/5.2/?.lua
  /usr/local/share/lua/5.2/?/init.lua
  /usr/local/lib/lua/5.2/?.lua
  /usr/local/lib/lua/5.2/?/init.lua
  /usr/share/lua/5.2/?.lua
  /usr/share/lua/5.2/?/init.lua

Lua C module search paths:
  /usr/lib/prosody/?.so
  /usr/local/lib/lua/5.2/?.so
  /usr/lib/x86_64-linux-gnu/lua/5.2/?.so
  /usr/lib/lua/5.2/?.so
  /usr/local/lib/lua/5.2/loadall.so

LuaRocks:               Not installed

# Lua module versions
lfs:            LuaFileSystem 1.6.3
libevent:       2.1.8-stable
luaevent:       0.4.6
lxp:            LuaExpat 1.3.0
socket:         LuaSocket 3.0-rc1
ssl:            0.7

for the kernel, I will check this part.

Regarding prosody, I just saw I have no monitor on this process ! It’s ok now, wait and see what’s happening today.
How could I do to have Prosody using all my core ? should I try epoll network method ?

Thanks again

We are using epoll, yes. It cannot use more than one core.
There are few module optimizations we did in latest, which haven’t hit stable yet.
Also you can try and latest 0.11 from the prosody repo.

As I am in prod environnement, I will stay on stable package.

Is there a way, else, to use Jitsi with a cluster of Prosody instance, to have a load balancing functionality based on prosody and not whole Jisti-meet setup ?

Nope, We just use multiple signaling shards. https://jitsi.org/blog/new-tutorial-video-scaling-jitsi-meet-in-the-cloud/

Ok, I found some GRAVE errors in the jvb.log, could it be related ?

2020-11-20 17:51:01.541 GRAVE: [778] [confId=93431a97a0d3bdc epId=a553ef61 gid=278549 stats_id=Marques-2NF conf_name=eaf956f6a5cc6a187591336633093b57@conference.videoconference.mysite.org] DataChannelStack.onIncomingDataChannelPacket#81: Could not find data channel for sid 0
2020-11-20 17:51:01.541 GRAVE: [778] [confId=93431a97a0d3bdc epId=a553ef61 gid=278549 stats_id=Marques-2NF conf_name=eaf956f6a5cc6a187591336633093b57@conference.videoconference.mysite.org] DataChannelStack.onIncomingDataChannelPacket#81: Could not find data channel for sid 0
2020-11-20 17:51:01.541 GRAVE: [778] [confId=93431a97a0d3bdc epId=a553ef61 gid=278549 stats_id=Marques-2NF conf_name=eaf956f6a5cc6a187591336633093b57@conference.videoconference.mysite.org] DataChannelStack.onIncomingDataChannelPacket#81: Could not find data channel for sid 0
[…] and some intersting avertissements
2020-11-20 17:51:01.541 GRAVE: [778] [confId=93431a97a0d3bdc epId=a553ef61 gid=278549 stats_id=Marques-2NF conf_name=eaf956f6a5cc6a187591336633093b57@conference.videoconference.mysite.org] DataChannelStack.onIncomingDataChannelPacket#81: Could not find data channel for sid 0
LocalCandidate=candidate:1 1 udp 2130706431 myip 10000 typ host
RemoteCandidate=candidate:10000 1 udp 1845501695 90.21.54.64 53611 typ prflx
2020-11-20 17:58:35.842 AVERTISSEMENT: [76] [confId=7c57322f9f579bf1 epId=f6b5cd1e gid=33328 stats_id=Shirley-oJD conf_name=reunion_de_1417@conference.videoconference.mysite.org] SendSideBandwidthEstimation.getRttMs#598: RTT suspiciously high (1130ms), capping to 1000ms.
2020-11-20 17:58:36.243 AVERTISSEMENT: [75] [confId=7c57322f9f579bf1 epId=f6b5cd1e gid=33328 stats_id=Shirley-oJD conf_name=reunion_de_1417@conference.videoconference.mysite.org] SendSideBandwidthEstimation.getRttMs#598: RTT suspiciously high (1744ms), capping to 1000ms.
2020-11-20 17:58:36.378 AVERTISSEMENT: [80] [confId=7c57322f9f579bf1 epId=f6b5cd1e gid=33328 stats_id=Shirley-oJD conf_name=reunion_de_1417@conference.videoconference.mysite.org] SendSideBandwidthEstimation.getRttMs#598: RTT suspiciously high (1009ms), capping to 1000ms.
2020-11-20 17:58:36.387 AVERTISSEMENT: [71] [confId=7c57322f9f579bf1 epId=f6b5cd1e gid=33328 stats_id=Shirley-oJD conf_name=reunion_de_1417@conference.videoconference.mysite.org] SendSideBandwidthEstimation.getRttMs#598: RTT suspiciously high (1560ms), capping to 1000ms.
LocalCandidate=candidate:1 1 udp 2130706431 46.105.53.108 10000 typ host
RemoteCandidate=candidate:10000 1 udp 1853824767 91.169.201.42 65299 typ prflx
2020-11-20 17:59:25.744 AVERTISSEMENT: [887] [confId=7c57322f9f579bf1 gid=33328 stats_id=Else-r5T componentId=1 conf_name=reunion_de_1417@conference.videoconference.mysite.org ufrag=eofjd1enj9h4cn name=stream-64575b91 epId=64575b91 local_ufrag=eofjd1enj9h4cn] MergingDatagramSocket.initializeActive#599: Active socket already initialized.
Got sctp association state update: 1
sctp is now up. was ready? false
LocalCandidate=candidate:1 1 udp 2130706431 46.105.53.108 10000 typ host
RemoteCandidate=candidate:10000 1 udp 1853693695 2.3.88.201 49513 typ prflx

By the way, yesterday night, I saw in munin thses stats, everything is fine regarding users quantity ?