Server load for many simultaneous meetings

Than you again for sharing your logging settings. Times series logging are now disabled by default in future jitsi-videobridge builds: https://github.com/jitsi/jitsi-videobridge/commit/357ee2c3019d58f4c539e0d9b60cd42b81eee2c9

3 Likes

Hi, thank you, I figured this earlier when I had problems with jdk11 and needed to install jdk8. I solved it with:

cd /usr/lib/jvm/jdk1.8.0_241/jre/lib/security
mv cacerts cacerts_ORIG
ln -s /etc/ssl/certs/java/cacerts cacerts

Milan

2 Likes

Great information!
I think it worth considering update it in JVB Docker image.
@saghul what do you think about that?

Hi @xranby, I’ve installed AdoptJDK8 hotspot, but no luck for us:

root@virt1:/usr/share/jitsi-videobridge# tail -f /var/log/jitsi/jvb.log
2020-05-08 16:03:23.670 INFO: [60] Videobridge.createConference#320: create_conf, id=cfc9d67a29913075 gid=null logging=false
2020-05-08 16:03:23.679 INFO: [60] AbstractHealthCheckService.run#171: Performed a successful health check in PT0.009S. Sticky failure: false
2020-05-08 16:03:33.671 INFO: [60] Videobridge.createConference#320: create_conf, id=ab42382c01eabd3c gid=null logging=false
2020-05-08 16:03:33.679 INFO: [60] AbstractHealthCheckService.run#171: Performed a successful health check in PT0.009S. Sticky failure: false
Unrecognized VM option ‘UseShenandoahGC’
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Unrecognized VM option ‘UseShenandoahGC’
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
^C
root@virt1:/usr/share/jitsi-videobridge# java -version
openjdk version “1.8.0_252”
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_252-b09)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.252-b09, mixed mode)
root@virt1:/usr/share/jitsi-videobridge#
root@virt1:/usr/share/jitsi-videobridge# cat jvb.sh
#!/bin/bash

if [[ "1" == "--help" || # -lt 1 ]]; then
echo -e “Usage:”
echo -e “$0 [OPTIONS], where options can be:”
echo -e “\t–secret=SECRET\t sets the shared secret used to authenticate to the XMPP server”
echo -e “\t–domain=DOMAIN\t sets the XMPP domain (default: none)”
echo -e “\t–host=HOST\t sets the hostname of the XMPP server (default: domain, if domain is set, “localhost” otherwise)”
echo -e “\t–port=PORT\t sets the port of the XMPP server (default: 5275)”
echo -e “\t–subdomain=SUBDOMAIN\t sets the sub-domain used to bind JVB XMPP component (default: jitsi-videobridge)”
echo -e “\t–apis=APIS where APIS is a comma separated list of APIs to enable. Currently supported APIs are ‘xmpp’ and ‘rest’. The default is ‘xmpp’.”
echo
exit 1
fi

SCRIPT_DIR="(dirname "(readlink -f “$0”)")"

mainClass=“org.jitsi.videobridge.Main”
cp=$SCRIPT_DIR/jitsi-videobridge.jar:$SCRIPT_DIR/lib/*
logging_config="$SCRIPT_DIR/lib/logging.properties"
videobridge_rc="$SCRIPT_DIR/lib/videobridge.rc"

if [ -f $logging_config ]; then
LOGGING_CONFIG_PARAM="-Djava.util.logging.config.file=$logging_config"
fi

if [ -f $videobridge_rc ]; then
source $videobridge_rc
fi

if [ -z “$VIDEOBRIDGE_MAX_MEMORY” ]; then VIDEOBRIDGE_MAX_MEMORY=3072m; fi

exec java -Xmx$VIDEOBRIDGE_MAX_MEMORY $VIDEOBRIDGE_DEBUG_OPTIONS -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp $LOGGING_CONFIG_PARAM $JAVA_SYS_PROPS -cp $cp mainClass @
root@virt1:/usr/share/jitsi-videobridge#

Do I need special patched version of AdoptJDK?

Thank you,

Milan

EDIT: I’ve downloaded: https://builds.shipilev.net/openjdk-shenandoah-jdk8/openjdk-shenandoah-jdk8-latest-linux-x86_64-release.tar.xz

and now it runs fine with:

root@virt1:/usr/lib/jvm/adoptopenjdk-8-hotspot-amd64/jre/lib/security# java -version
openjdk version “1.8.0-builds.shipilev.net-openjdk-shenandoah-jdk8-b654-20200507”
OpenJDK Runtime Environment (build 1.8.0-builds.shipilev.net-openjdk-shenandoah-jdk8-b654-20200507-b654)
OpenJDK 64-Bit Server VM (build 25.71-b654, mixed mode)

Is it secure to run this version?

THX!

Milan

I got this error when I run this stats.sh

./stats.sh: line 6: syntax error near unexpected token `('
./stats.sh: line 6: `   echo -e "\033[1;34mjitsi_{STAT}: \e[0;32m"(echo $XSTATS | jq ".$STAT")'

what’s went wrong?

Hi @xranby, today I’ve stressed our server (and users :slight_smile: a bit. I’ve disabled second JVB that all our users are on only one JVB. I’m running Java with: -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:+UseNUMA -XX:+AlwaysPreTouch -XX:+DisableExplicitGC GC parameters.

Here are my perf probes at different load levels:

  1. Totally overloaded system :slight_smile:

root@virt1:~/scripts# ./stats.sh
jitsi_bit_rate_upload: 320165
jitsi_bit_rate_download: 36406
jitsi_participants: 269
jitsi_conferences: 18
jitsi_largest_conference: 36
jitsi_endpoints_sending_video: 75
jitsi_endpoints_sending_audio: 37
jitsi_receive_only_endpoints: 181
jitsi_threads: 1297
jitsi_total_failed_conferences: 0
jitsi_total_partially_failed_conferences: 3

Zatazenie: 10:22:59 up 38 days, 12:28, 3 users, load average: 26.23, 20.26, 15.05
Netstat: 155661 receive buffer errors 41 send buffer errors

  1. I’ve enabled second JVB server and waited to get load down a bit, first JVB had HIGH load still:

root@virt1:~/scripts# ./stats.sh
jitsi_bit_rate_upload: 164841
jitsi_bit_rate_download: 32085
jitsi_participants: 186
jitsi_conferences: 12
jitsi_largest_conference: 36
jitsi_endpoints_sending_video: 36
jitsi_endpoints_sending_audio: 20
jitsi_receive_only_endpoints: 143
jitsi_threads: 891
jitsi_total_failed_conferences: 0
jitsi_total_partially_failed_conferences: 3

Zatazenie: 10:41:40 up 38 days, 12:47, 3 users, load average: 9.79, 10.25, 11.89
Netstat: 155661 receive buffer errors 41 send buffer errors

  1. Later on when load sinks are my probes more comparable with previous GC graphs, moderate/high load on server:

root@virt1:~/scripts# ./stats.sh
jitsi_bit_rate_upload: 29496
jitsi_bit_rate_download: 8325
jitsi_participants: 127
jitsi_conferences: 9
jitsi_largest_conference: 36
jitsi_endpoints_sending_video: 8
jitsi_endpoints_sending_audio: 14
jitsi_receive_only_endpoints: 112
jitsi_threads: 582
jitsi_total_failed_conferences: 0
jitsi_total_partially_failed_conferences: 4

Zatazenie: 10:57:54 up 38 days, 13:03, 3 users, load average: 3.32, 3.82, 6.80
Netstat: 155661 receive buffer errors 41 send buffer errors

  1. Moderate/low load on server:

jitsi_bit_rate_upload: 39585
jitsi_bit_rate_download: 5976
jitsi_participants: 88
jitsi_conferences: 7
jitsi_largest_conference: 35
jitsi_endpoints_sending_video: 7
jitsi_endpoints_sending_audio: 9
jitsi_receive_only_endpoints: 78
jitsi_threads: 449
jitsi_total_failed_conferences: 0
jitsi_total_partially_failed_conferences: 4

Zatazenie: 11:10:23 up 38 days, 13:16, 3 users, load average: 2.09, 2.50, 4.33
Netstat: 155661 receive buffer errors 41 send buffer errors

I hope it will be usefull

Can you please draw conclusions of it? If you are interested in more details/graphs let me know please.

Thank you,

Milan

1 Like

Here’s my graph, does anyone have an insight why it’s taking too much CPU?

Settings:
Upcloud 6Core Xeon gold 6136 cpu 3.00 ghz
16G RAM
CPU usage: 45-50%
RAM: less than 5%

7 users, all have video and audio on.

For this graph I ran using this: sudo perf record -F 99 -a -g -p 25404 – sleep 30
where 25404 is my java (jvb) process.

https://www.mediafire.com/view/r84ov3qy3akxtlr/flamegraph2_java_based.svg/file

here’s another graph if I run sudo perf record -F 99 -a -g -- sleep 30

https://www.mediafire.com/view/m02g7l1f8veb52m/flamegraph_all.svg/file

@acakid Are you using jitsi-videobridge JVB 1 ? … or are you using the desktop client?

I see function call names from libjitsi in your images, function that are removed when using the current jitsi-videobridge JVB 2.

@xranby Yes we’re using JVB1… Funny thing is our staging server is also using JVB1, 1GB of RAM, 1 Core CPU and it’s performing flawlessly.

@migo your graph using JVB2 in combination with ShenandoahGC are interesting.

users   load average    threads
88        2.09          449    ~3% CPU time spent GC, the generating json peak is thin
127       3.32          582    ~3% CPU time spent GC, the generating json peak is thin
186       9.79          891    here the CPU time performing GC changed from 3% to 30%, the generating json peak is thick!
269      26.23         1297    here the CPU time performing GC is also 30%, the generating json peak is thick!

GC overhead bottleneck
I am interested if you give the jvm running the jitsi-video bridge more heap memory Xmx , would that allow you to have 128+ users with the same low 3% CPU time spent GC?

stack RAM wall
The thread count is starting to become alarming, just allocating stack for each thread now start to consume gigabytes of RAM. Assuming a standard glibc stack size of 8mb then your threads consume 10Gb of RAM! If thread count goes up more then the application will stop with out of memory because it is unable to allocate stack RAM for each new thread.
Two ways to fix this:
workaround: add more RAM
the proper way: re-architecture JVB2 to have a fixed pool of worker threads that consume tasks, JVB 2 already do this partially, so more of the same medicine !

The one thread per user issue, in the JVB 2 case ~4.5 threads per user , is known as the C10k problem.

@bbaldino

1 Like

@acakid For JVB 1 you can try use taskset as a workaroundto lock the JVB 1 to only use one CPU core, there is likely a spinlock or something that makes code running on CPU A wait for CPU B using JVB1.

The only way to solve it is to re-achitecture the code, as it is done in JVB 2.
JVB2 is considered stable

An off-cpu graph may reveal why things are blocking in multi cpu scenarios:

Hi @xranby, thank you for your analysis!

No problem plenty of free RAM here:

root@virt1:~/scripts# free
total used free shared buff/cache available
Mem: 57701004 5344592 48984844 123932 3371568 51596728
Swap: 15624188 0 15624188

and

root@BackupStorage:~/scripts# free
total used free shared buff/cache available
Mem: 74218752 3276576 68191868 58276 2750308 70201120
Swap: 15624188 0 15624188

Now is heap memory set to: 3072m, what amount do you suggest? Tomorrow will be good chance to test it.

Are you interested in comparison between ShenandoahGC and ConcMarkSweepGC with the same Java version? One JVB is running ConcMarkSweepGC and another JVB ShenandoahGC and I’ve made some graphs today.

Thank you,

Milan

@xranby Thank you for your insight. I spawned a new server with the same hardware configurations. I did a new installation of JVB2 on it. Here’s what I got:

4 users
CPU usage: ~20%

And I already set logging level = WARNING. This is my logging.properties
handlers= java.util.logging.ConsoleHandler
#handlers= java.util.logging.ConsoleHandler, com.agafua.syslog.SyslogHandler

java.util.logging.ConsoleHandler.level = ALL
java.util.logging.ConsoleHandler.formatter = org.jitsi.utils.logging2.JitsiLogFormatter

net.java.sip.communicator.util.ScLogFormatter.programname=JVB

.level=WARNING

org.jitsi.videobridge.xmpp.ComponentImpl.level=FINE

# All of the INFO level logs from MediaStreamImpl are unnecessary in the context of jitsi-videobridge.
org.jitsi.impl.neomedia.MediaStreamImpl.level=WARNING

# Syslog(uncomment handler to use)
com.agafua.syslog.SyslogHandler.transport = udp
com.agafua.syslog.SyslogHandler.facility = local0
com.agafua.syslog.SyslogHandler.port = 514
com.agafua.syslog.SyslogHandler.hostname = localhost
com.agafua.syslog.SyslogHandler.formatter = org.jitsi.utils.logging2.JitsiLogFormatter
com.agafua.syslog.SyslogHandler.escapeNewlines = false

# to disable double timestamps in syslog uncomment next line
#net.java.sip.communicator.util.ScLogFormatter.disableTimestamp=true

# time series logging
java.util.logging.SimpleFormatter.format= %5$s%n
java.util.logging.FileHandler.level = ALL
java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter
java.util.logging.FileHandler.pattern = /tmp/jvb-series.log
java.util.logging.FileHandler.limit = 200000000
java.util.logging.FileHandler.count = 1
java.util.logging.FileHandler.append = false

timeseries.level=OFF
timeseries.org.jitsi.videobridge.cc.vp8.level=ALL
timeseries.useParentHandlers = false
timeseries.handlers = java.util.logging.FileHandler

This is my htop:


I don’t think it’s normal right? I will get a flamegraph today.

Remove all the following lines from logging.properties as that should lower your CPU usage. You do not need the /tmp/jvb-series.log
Threse lines are removed in future JVB 2 releases.

# time series logging
java.util.logging.SimpleFormatter.format= %5$s%n
java.util.logging.FileHandler.level = ALL
java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter
java.util.logging.FileHandler.pattern = /tmp/jvb-series.log
java.util.logging.FileHandler.limit = 200000000
java.util.logging.FileHandler.count = 1
java.util.logging.FileHandler.append = false

Hi @xranby, I’ve made some probes today. I’ve modified heap settings as you requested and jvb is running now with:
jvb 5860 1 62 May12 ? 14:26:11 java -Xmx8192m -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:+UseNUMA -XX:+AlwaysPreTouch -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -XX:+PreserveFramePointer -Dnet.java.sip.communicator.SC_HOME_DIR_LOCATION=/etc/jitsi -Dnet.java.sip.communicator.SC_HOME_DIR_NAME=videobridge -Dnet.java.sip.communicator.SC_LOG_DIR_LOCATION=/var/log/jitsi -Djava.util.logging.config.file=/etc/jitsi/videobridge/logging.properties -cp /usr/share/jitsi-videobridge/jitsi-videobridge.jar:/usr/share/jitsi-videobridge/lib/* org.jitsi.videobridge.Main --host=mydomain.net --domain=mydomain.net --port=5347 --secret=2ddddVO9 --apis=rest,

There was not high load on our server today so maybe it is useless:

root@virt1:~/scripts# ./stats.sh
jitsi_bit_rate_upload: 182841
jitsi_bit_rate_download: 34206
jitsi_participants: 124
jitsi_conferences: 8
jitsi_largest_conference: 46
jitsi_endpoints_sending_video: 32
jitsi_endpoints_sending_audio: 13
jitsi_receive_only_endpoints: 86
jitsi_threads: 542
jitsi_total_failed_conferences: 0
jitsi_total_partially_failed_conferences: 1

Zatazenie: 12:36:21 up 40 days, 14:42, 3 users, load average: 3.59, 3.07, 2.64
Netstat: 155661 receive buffer errors 41 send buffer errors

Thank you in advance for your rewiev, if you need more info let me know please.

Today afternoon I’ve enabled octo to see how it works, but I see some droped packetes on booths JVBs and jitsi_total_failed_conferences never seen before :frowning:

root@virt1:~/scripts# ./stats.sh
JVB stats:
jitsi_bit_rate_upload: 395
jitsi_bit_rate_download: 1077
jitsi_participants: 9
jitsi_conferences: 2
jitsi_largest_conference: 6
jitsi_endpoints_sending_video: 1
jitsi_endpoints_sending_audio: 3
jitsi_receive_only_endpoints: 1
jitsi_threads: 115
jitsi_total_ice_failed: 69
jitsi_total_failed_conferences: 15
jitsi_total_partially_failed_conferences: 8
Octo stats:
jitsi_octo_send_bitrate: 1067228
jitsi_octo_receive_bitrate: 231260
jitsi_octo_send_packet_rate: 249
jitsi_octo_receive_packet_rate: 103
jitsi_octo_conferences: 2
jitsi_octo_endpoints: 5
jitsi_total_packets_sent_octo: 15369346
jitsi_total_packets_received_octo: 17547064
jitsi_total_packets_dropped_octo: 59322

Zatazenie: 21:02:07 up 40 days, 23:07, 3 users, load average: 0.23, 0.13, 0.10
Netstat: 155661 receive buffer errors 41 send buffer errors

and

root@BackupStorage:~/scripts# ./stats.sh
JVB stats:
jitsi_bit_rate_upload: 1209
jitsi_bit_rate_download: 440
jitsi_participants: 9
jitsi_conferences: 2
jitsi_largest_conference: 6
jitsi_endpoints_sending_video: 1
jitsi_endpoints_sending_audio: 2
jitsi_receive_only_endpoints: 3
jitsi_threads: 101
jitsi_total_ice_failed: 52
jitsi_total_failed_conferences: 13
jitsi_total_partially_failed_conferences: 4
Octo stats:
jitsi_octo_send_bitrate: 214851
jitsi_octo_receive_bitrate: 1072621
jitsi_octo_send_packet_rate: 97
jitsi_octo_receive_packet_rate: 251
jitsi_octo_conferences: 2
jitsi_octo_endpoints: 4
jitsi_total_packets_sent_octo: 17546332
jitsi_total_packets_received_octo: 15367772
jitsi_total_packets_dropped_octo: 32273

Zatazenie: 21:02:01 up 32 days, 9:11, 3 users, load average: 0.17, 0.17, 0.18
Netstat: 153656 receive buffer errors 19 send buffer errors

Any ideas?

Thank you,

Milan

When comparing system load numbers you have to take the number of CPU threads in consideration.

100% CPU usage on a 32 threaded CPU is 32 in system load, thus on a beefy machine I would not be alarmed unless load is above 32.

On a high end 32 thread machine 15% total CPU usage equals 4.8 system load, thus the load average can be around 5 for high end systems with normal load, thus a high system load alone should not be a reason to think something is wrong.

A overloaded system only happen when the load is greater than the number of CPU threads.

3 Likes

Hi @xranby, here is one recent probe with higher server load with SD GC, and it is clearly advantage over CMS GC I think.

root@BackupStorage:~/scripts# ./stats.sh
JVB stats:
jitsi_bit_rate_upload: 369145
jitsi_bit_rate_download: 30787
jitsi_participants: 72
jitsi_total_participants: 125
jitsi_conferences: 4
jitsi_largest_conference: 36
jitsi_endpoints_sending_video: 46
jitsi_endpoints_sending_audio: 8
jitsi_receive_only_endpoints: 22
jitsi_threads: 355
jitsi_total_ice_failed: 0
jitsi_total_failed_conferences: 0
jitsi_total_partially_failed_conferences: 0
Octo stats:
jitsi_octo_send_bitrate: 0
jitsi_octo_receive_bitrate: 0
jitsi_octo_send_packet_rate: 0
jitsi_octo_receive_packet_rate: 0
jitsi_octo_conferences: 0
jitsi_octo_endpoints: 0
jitsi_total_packets_sent_octo: 0
jitsi_total_packets_received_octo: 0
jitsi_total_packets_dropped_octo: 0

Dropnute pakety v JVB: 0 0
Unable to find encoding matching packet: 0
Negative rtt: 0
Resource temporarily unavailable: 0
Couldn’t find packet detail for the seq nums: 0
Suspiciously high rtt value (Client CPU problems): 0
Unsupported media type: 0
Invalid Octo packet: 0
SEVERE messages: 5
Zatazenie: 09:35:11 up 47 days, 21:44, 3 users, load average: 5.26, 5.01, 4.72
Netstat: 153656 receive buffer errors 19 send buffer errors

Thank you,

Milan

1 Like

@migo @xranby Amazing research work you are doing. Being a Performance Engineer this is sheer goldmine of research.

Looks like we can scale up to 269 with some issues. but 150 seems to be a safe target per JVB.

@xranby I read the whole thread, could you please specify your configuration again?

I will probably try to make not all the modifications and get my server up and running.

2 Likes

@migo @xranby Hey guys, thanks for your phenomenal help.

2 Likes