Hi, @xranby, what message exactly should I look for in jvb.log. I cant find disconnect/reconnect messages. I was part of 24 member room and enabling all cameras ended bad. Can I PM you our jvb.log if you can find something? Or part of it like: cat /var/log/jitsi/jvb.log |grep " affected conference name"
In the JVB log this is what i see when connection is lost because a 13 year old dell client is overloaded (due to multitasking on the client side): 2020-05-04 06:21:30.852 INFO:  [confId=9a6a956fc0a1ee71 gid=ff191b stats_id=Crawford-w8e conf_name=live ufrag=5417c1e7dr6qui epId=b6c56086 local_ufrag=5417c1e7dr6qui] ConnectivityCheckClient.processTimeout#857: timeout for pair: 192.168.1.1:10000/udp/host -> 192.168.1.123:59972/udp/prflx (stream-b6c56086.RTP), failing.
and then inside the nginx log i see the reconnect, operating system in use and the browser version: 192.168.1.123 - - [04/May/2020:06:21:55 +0200] "POST /http-bind?room= HTTP/2.0" 200 243 "" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36"
by combining the JVB log and the nginx log it should be possible to tell if some client os+browser combination are having more issues compared to others. When the JVB runs stable and clients disconnects then it may be issues on the client end.
using 20-30 people all with with video enabled inside one conference room you are quickly approaching what is possible using SFU webrtc videoconferencing.
When users are using tile view:
23 users all using 160p tileview sending video would using Selective Forwarding Unit ( SFU ) (videobridge) require
0.2mbit * 23 = requires an 4,6 Mbit download for all users.
23 users all 480p sending video would using Selective Forwarding Unit ( SFU ) (videobridge) require
0.7mbit * 23 = requires an 16 Mbit download for all users, people on slow link may start to see reduced video resolution.
If some user connects with a HD webcam and that users webbrowser only send the HD frames to the videobridge then the videobride forward that HD frame to all users. Thus for each user with a browser that only send HD the download requirements are increased by 2Mb for each such user. Now users who cant receive this much data will start to drop out.
Using full screen mode is ideal for SFU. The videobridge then only need to send the video for the active speaker to all viewers. Each user watching fullscreen view then only need to receive 0.7Mbit for 480p.
The server will likely still run stable regardless what happens.
If jitsi would extend the videobridge to have a MCU then the server requirements may increase while bandwidth to the end users get reduced for large conference rooms. Adding MCU support is not trivial, maybe possible to do with low latency by using gstreamer or opengl rendering on the server.
The upside of using SFU is low latency and reduced server requirements,
work great for up to 20 users with audio + video enabled,
work great for 70 users using audio only,
work great for 100 users with one speaker + video to many listeners & viewers.
Typical SFU latency stay below 50ms hence real-time!
Using MCU adds latency, require the server to do the mixing of all video streams,
The upside of MCU is lower end user download bandwidth.
The “unable to find encoding matching packet” does not seems to be linked to a specific browser or platform.
My feeling about “Suspiciously high rtt value” is also that it is related to high CPU on end user side. But it was just a feeling. Thanks for confirming that.
I already have set 480 resolution, Start muted and disableAudioLevels.
I do not understand what concretely do DISABLE_VIDEO_BACKGROUND.
About disableH264, I fear that it could have a negative impact on small hardware config where H264 hardware decoding currently works well…
Today, I discovered that jvb, with default deb packages config, is doing some really intensive logging in /tmp/jvb-series.log
I understand it could be usefull for debug, but as a default config, I am a bit surprised !
… and it seems to be linked with the logging things we see eating CPU on flamegraphs !
In /etc/jitsi/videobridge/logging.properties I have changed FileHandler.level to OFF:
java.util.logging.FileHandler.level = OFF
And a new flamegraph, under decent load (but not huge, the load was sadly correctly load-balanced between different jvb !)
Hi, thank you for clarification. Jitsi meet experience is strongly dependent on link quality of participants it seems.
Is there somewhere more info about WebRTC limits you mentioned? Is video conference with 50 video participants possible with jitsi? There is no info about such a limitations in jitsi sofrware or @bbaldino@damencho ?
In general the flamegraph looks OK, good profiling,
i have two main comments.
org/jitsi/videobridge/cc/vp8/VP8FrameProjection:::rewriteRtp spend quite a lot of time generating json. 2.38% of total CPU time on the machine, and a huge percentage of endpoint send is slowed down by this. JVB engineers should look at this and check if json generation can get optimized here.
The top of the big green peak is zoomed in here to illustrate how large portion of the Endpoint:::Send is spent generating json
The Second observation is that now when the server runs more optimal, by having logging disabled, makes the garbage-collectors CPU usage to be a new viable target to look at to improve performance. The CMS garbagecollector use 3.15% of total CPU time (the yellow peak in your graph quoted below). You are in a position to start evaluate if switching to the new generations of garbage collectors such as G1 and the very latest concurrent pauseless garbagecollectors ZGC and Shenandoah may remove this 3.15% of CPU usage currently spend garbagecollecting using the ConcurrentMarkSweep CMS collector.
Page 53 of this slide deck show some interesting compares of different new JVM GC’s Choosing Right Garbage Collector to Increase Efficiency of Java Memor…
You can explore if generating a “memory” flame graph can give clues how to lower JVB over all memory usage or remove the need to perform garbage-collects.
An Off-cpu flamegraph may also be interesting to see if a service request is blocked.
Flame Graphs - here is the latest flamegraph research that describe the different types of cpu, memory and off-cpu flame graphs.
I still do not have any clues why your system show such many dropped packages,
can you share your dmesg ? Have you installed the latest firmware for your network card?
On my system using a realtek NIC i had to install the firmware-realtek debian package from debian contrib non-free repository’s.
KUDOS @xranby! I’m learning a lot by you! Thank you! I’m wondering how great must be your servers tuned with your skills!
Tomorrow will have lesson that class with 25 cameras freaks and I’ll try to capture that epic fail with flame graph. What else should I look at?
I’ll look at given suggestions and study materials.
That netstat receive buffer errors are for 32 days uptime, is that that bad? I can confirm that number of receive buffer errors is increased when is server under very high load. It is:
Hi @xranby, I’m trying to catch perf trace under high load, but produced two jvb crashes:
jvb@virt1:/tmp/test$ java -cp attach-main.jar:$JAVA_HOME/lib/tools.jar net.virtualvoid.perf.AttachOnce 23869
Exception in thread “main” java.io.IOException: Premature EOF
Had you such a experinece in past?
All conferences were moved to second JVB and can continue. While all rooms were moved to one jvb nice high load situation occurred for us: