As promised, here is the result of our big event using our own Jitsi-Meet deployment.
We were waiting for 2300 participants during the day, with around 250 connected simultaneously at all time. EVERYONE had to have their cameras running and there were around 15 rooms opened at all times with one person sharing his screen (presentation) and webcam at the same time.
There were “breakout rooms” for small workshops, around 4 per main room. This meant that most rooms generated 4 other rooms for about 20 minutes per hour and most users forgot to close their camera in the main room so they had 2 streams going out at the same time.
All servers were Amazon T3a.xlarge…
- AMD EPYC 7000 series clocked at 2,5Ghz
- 4 vCores (the “V” is important here, see below)
- 16gb RAM
- 5 Gbps bandwidth burst
Jitsi-meet-docker with custom setups:
720p forced (simulcast disabled)
All last-n functions disabled (everyone should receive all streams)
(5x) extra JVBs
Jitsi-videobridge2 on bare Metal Ubuntu 20.04
Communication between bridges and main instance used internal Amazon network (using private IP instead of website address… I was completely unable to do it otherwise)
Reasons for this particular configuration:
We chose jitsi-meet-docker for the main instance for it’s flexibility and ease of deployment
We chose jitsi-videobridge2 on bare-metal for the additionnal JVBs for the simplicity of the setup.
We disabled simulcast and last-n because it simply did not work in the docker version and everyone was getting 180p resolution otherwise! That being said, I am still working on the issue with some nice folks here on the forum: https://community.jitsi.org/t/urgent-low-image-quality-after-update
We forced 720p because some users are using virtual cameras and on-screen composition with green-screen for their presentations (instead of screen sharing, the webcam contains the presentation). We did not go up to 1080p because for that particular event, the conferences were windowed inside an i-Frame on our website.
The result of this setup
Overall, the event went great, except for a particular moment in the morning when one “batch” of people were nearing the end of their conferences and another “batch” joined.
This was our major issue of the day:
At that moment, the quality went bad, the main server was hit by 100% CPU usage and could not keep up. I was able to react fast and the solution was to shut-down the JVB container on the main server to lower load and let the extra JVBs handle video. Previous testing ensured that closing a JVB and automatically falling back on another one would only give a one second glitch on the user end. The quality went back to normal.
Upon further inspection, here are the causes
1 - EC2 Virtual Cores:
While Amazon AWS EC2 servers are great general usage servers, they are not that good at constant workloads. As soon as an EC2 instance reaches constant CPU usage levels over 40-50%, there is a runaway effect that make it slowly creep up to 100% and become unresponsive.
This seems to be due to the fact that these are virtual cores (shared with others) and have no Hyperthreading… This means that they take the instruction as they come and do not pre-fetch anything.
We are actually looking to go back to OVH for a dedicated server as this was giving us a lot more performance for a low monthly fee. Anyways, 4 Vcores are not enough for the XMPP, Jicofo, etc if you add the JVB to the same machine…The internet and internal network Bandwidth was more than sufficient.
1b - Extra JVBs not getting used:
I had to reset extra JVBs several times during the day because the main server seemed to have become unaware that it had them available to work with. The most I could get was load on 3 JVBs and the 2 others would not get anything, ever. Most times, the server would only use 2 and load one of them a lot more than the other… I am probably doing something wrong, but I can’t tell what!
Other minor issues
- The “speaker stat” window shows “0” all the time, no stat is ever updated. From what I could read, this affects only the docker version of Jitsi-meet… I have yet to sort this out!
- The youtube video share was unreliable. I could not confirm that problem on our side but many users have pointed it… We are still digging through data, but I suspect a browser / system issue… Some users were using MacOS and Catalina is a bit pile of trouble for webRTC. Some others were using Edge… Poor souls!
- I kept having errors related to Colibri… this will need further investigation. Something about ports.
- When in full screen, if you pull-out the chat window, the video frame moves away instead of resizing and does not return to it’s place … Window needs to be reset (resize, change view mode) to revert to normal. This was reported here: https://github.com/jitsi/jitsi-meet/issues/7889 and is reported as fixed on the main project, but does not seem to be fixed in the docker repo
Bandwidth estimations only works on one side (upload from what I understand)… That could be linked to the quality issue when using Simulcast… We need to dig deeper into this! (see image below)
Hope this can help someone! I will post my detailed setup and config files later this week, as soon I am finished “depersonalizing” them (removing sensitive server information)
@damencho, I am putting your name here if ever that can interest you… I don’t know who else in the dev team could be interested (I don’t even know who is in charge of the docker version nowadays!)