Yes, I have been testing to try to increase the estimate calculations for a capacity and costs calculator matrix with so far 250 different combinations of rooms, participants, with and without recording, with and without closed captions, etc. all-in-one instance, versus simple cluster (core + multiple JVBs), and more complex cluster setups. In coming weeks will be adding OCTO, HA Proxy, containerization, Kubernetes, etc. as additional configuration considerations.
I am still each week filling in the load test results related to slots for each of these configurations. Here is a screenshot of just a small portion of this matrix:
As I continue to gather data, I modify the formulae accordingly to try to improve accuracy. Right now there is a lot of theoretical ballparking for the higher numbers, but the smaller setups around ~1,000 participants numbers are starting to solidify. I have the load-texting currently working for the all-in-one server setup up to 2,500 users so far. Meanwhile building up infrastructure for simple and complex clusters to begin load testing those components individually, bit by bit. Anything specific you were needing @Johan66 ? Happy Jitsi-ing!
Finally AWS raised my Fargate Spot Instance limit from 1,000. They refused to raise it to 5,000, they only raised it to 2,500. They want me to instead try to use multiple AZ’s, but that is also more expensive, so I am not sure whether I will be authorized to do so, will see.
However I managed to squeeze stable 2 users per instance with very maxxed “hardware” settings (which is getting expensive, around $140/day when running load tests all day at this maxxed level). 3 users per instance is still too unstable for reliable results even at maximum hardware. In theory I should be able to start ratcheting up from the current number. I found previously that 2,500 Jitsi all-in-one is really struggling, 2000 is more usable with the current settings. Cranking the Jitsi hardware up to 24x made no difference. Now that I can crank up the simulated users higher, I will be posting logs to see there are any further tweaks can be made to the all-in-one setup to see if it can be squeezed any higher.
Meanwhile I have been setting up various cluster configurations and will begin load testing those this upcoming sprint (2 week sprint begins this coming Wednesday).
I will be sure to keep sharing the numbers as I can.
@Lewiscowles1986 this is for a K-12 school system. That 2,500 (goals is 5,000) is total number of participants for the server, not for a single room.
Participants per room range being test ranges between 10 to 200 with a variety of sender ratios (1 per room, 5 per room, 20 per room, no more than 75 video per room, but only one audio sender at a time). But do have to also load test later handling video/audio recording of each room, and closed captions for each room (later). The average class size they have (we see from their logs) is 45-65 students per room.
The teachers mandate that ALL STUDENTS MUST HAVE THEIR CAMERAS ON AND SENDING at all times so the teachers can see if the students appear to be paying attention.
Only 1 audio speaker at a time.
In order to speak, student should raise their hand, and then teacher can unmute the student to allow them to speak.
Does that help clarify?
So, this means a LOT of video senders, but not a lot of audio senders. The bottleneck is so many video senders.
The larger room sizes (200, 400, 1,000, 3,000) are for web events / conferences, where they will only have very few concurrent video senders (maybe a panel of 6-12 people at most). Everyone else needs to be in the room to be able raise their hand, change their emojis, chat in the text, and if raising hand to speak, the moderator unmutes the participants to speak. So for the very large 100+ participants that is more webinar, and could use streaming options (though would have to self-hosted, they can’t use public services like Youtube it would violate privacy policies and laws for minors on video), and they do not currently have an in-house streaming service.
I am evaluating options like Matrix-Synapse to address a lot more chat features, but they are not excited by “yet another platform” to have to setup and support, so we’re supposed to try to do as much as possible with what it there, though Matrix hasn’t been ruled out yet (hopefully will know by August/September). Matrix would solve a huge number of chat-related and file-related, bot, automation, and other features. I’ve been using Matrix in my own communities for about a year now with Jitsi and very much like it personally).
Ultimately have to be able to support a MINIMUM of 20,000 video senders during peak school hours, likely grow to 40-65k soon. Those are more appropriate for clustered setups, but for some of the tasks a sufficiently tuned all-in-one server might be sufficient for and far less expensive, so need to know the full range of options and configurations to meet their many different use cases they need addressed. Currently all being done with Zoom and Google Hangout, so in-house Jitsi setup (can’t be hosted elsewhere, only their AWS or their on-prem) needs to be price competitive, better quality, and more reliable. They would like to be completely moved to Jitsi by summer 2022. So I have a LOT of analysis, planning, R&D, testing, release phases, to work through.
That help clear things up? Cheers!
I’m very interested in the results for k8s clusters. I have a project of my own that will require average of 50 - 100 users per conference, with room to scale in the future. But in a more webinar like state (so only 1 user uses video, voice, screenshare and with a Jibri instance). And I also have the same problem as @rpgresearch regarding the laws for minors on video. So Youtube etc are also out of the question.
And I must say that your doing the community a great favor with publishing your analysis and all. This can help a lot of people. Keep up the great work
For large total numbers of participants you’re always going to need some form of horizontal scaling.
Due to synchronisation and overheads around context-switching and caches, the gains you get in achievable packet rate from adding CPUs to a single JVB instance get smaller as the number of CPUs gets higher (i.e. adding CPUs to a single instance exhibits diminishing returns, an effect common to a lot of multithreaded software). You can help with this by choosing a concurrent GC for the JVM, but there’s synchronisation in JVB itself so you’ll always have this effect to some extent.
As a result, a single JVB instance won’t be able to handle more than a certain packet rate no matter how much extra hardware you throw at it.
If you are dead set on using larger servers rather than adding more servers, you can scale horizontally within a single server, by running multiple JVB instances per server. If you’re doing this, make sure you have enough RAM, consider an IP address per JVB to make ICE configuration simpler, and pin specific CPUs for each JVB to reduce context-switching overhead and cache contention. If you’re already familiar with them, a container system like containerd, systemd-nspawn or docker helps with simplifying the needed config and providing some structure to make ongoing maintenance easier. If you end up needing more than one server anyway (and at 20,000 video senders, you will) you may consider an orchestration system like k8s for those containers. By using Fargate, you’re basically doing this but outsourcing the management of the containers and the underlying servers to AWS, which is perfectly sensible but comes with some extra cost (especially in their egress traffic charges).
With horizontal scaling 20k simultaneous users is not really a challenge; we use ~11k simultaneous users (150 conferences of 75 participants each, 7-10 conferences per JVB node depending on region) to test our automatic scale-out and scale-in and it works flawlessly. 20k would work exactly the same way, just with more servers.
I think perhaps even more important than the CPU load and RAM is the bandwidth availability when using a single large server as opposed to multiple smaller servers. The first limiting factor in SFU is the available bandwidth, which is why hosting multiple JVB instances on the same server (even with separate IPs). makes little sense if they’re all still using the same bandwidth.
It depends. Most hosting providers oversell their bandwidth and market you a meaningless “port speed”, in which case you’re absolutely right, adding a second JVB on the same server is just going to contend the limited bandwidth more.
But, if you get proper connectivity (e.g. we use a bond of 2x 10GbE on our servers and are colocated at Internet exchange points, so plenty of upstream capacity is available), you will generally hit JVB’s limitations on how much CPU capacity it can make use of long before you’ll hit upstream bandwidth limitations. In this situation it absolutely makes sense to run multiple JVBs on the same server in order to make use of the resources available.
@jbg, thank you kindly for your feedback and suggestions. Very much appreciate you enumerating
We are actively testing all of those configurations and comparing cost/performance for each configuration.
An example of some of the Jitsi architecture variants is listed here (document is out of date from internal doc, but it is general idea of the variance):
Some projects I’m scoping will only need a moderately scalable single instance, maybe with some extra manual JVB instances or containerized/auto-scaled JVBs.
Some of the other projects will be able to use the AWS utilities, while other phases of the project are on-prem and so ZERO AWS features will be available.
Where the extra scaling is needed for the larger rooms with many senders, will be testing out Octo.
For the main project (20k+ senders), already planned on bringing in HA Proxy Octo and Kubernetes for the on-prem version.
The all-in-one testing is just to see how far that can be taken for those projects that don’t want to deal with a lot of server instances or containerization (smaller projects), and a range of mid-size to larger projects that are okay with just a few core instances plus limited containerization, versus those happy to embrace full k8 implementation.
Last year I ran a few conventions around 20k simultaneous users, but they were very large rooms with very few senders, with only 4 servers. The challenges I have to enumerate in detail are:
How does this factor from a cost perspective though? I’ve observed that most people who consider hosting multiple JVBs on a single server see it as a cost-saving venture (whether real or presumed). It’s hardly likely that the same would want to assume the steep cost of 10GbE. The average user hosts on AWS or similar and those cloud providers tend to charge significantly more with higher bandwidth demands. That’s why it’s fiscally more reasonable for the average user to use multiple smaller JVBs as opposed to a single server. Agreed that in a case where the upstream capacity is not a concern, then a large server would make sense, but that’s hardly ever the case with most people.
Dealing with some being in AWS, some in Azure and AWS mix, and some in On-prem with no AWS.
Some are only going to be emergency standby when Zoom is down, and idling/offline rest of the time, some will just be limited hours of use, and some will be 24x7.
Some are more soft-cost (staffing) sensitive than hard-cost (infrastructure), others just want a simpler setup even if it costs more and they want to “deal with” fewer server instances in their environment. Some are not yet ready to embrace containerization. Some are interested in K8 but not yet ready either.
A wide range of use cases, and I have to try to have the answers for all of them in a fairly formulaic and well documented way by August/September.
10GbE is not that expensive any more. The goal is simply to make the most efficient use of your resources. If you have a server with 2x10GbE, and a single JVB instance can’t use more than 20% of your CPU capacity and 5% of your bandwidth, you should run multiple JVB instances or else you’re paying for resources you can’t use. Since in general a smaller number of larger servers is cheaper than a larger number of smaller servers, this way costs less.
AWS is generally one of the most expensive options for hosting JVBs, due to their high egress traffic charges.
I think we’re going to agree to disagree on this one. I’m not sure how much 10GbE costs right now, but as of 2019, that monster was going for anywhere between $4,000 and $9,000 per month! I’ve seen advertisements for about $900/month now (not sure of the service, reliability, provider e.t.c…). I think meet.jit.si is a very specific use case and very few installations will rival the use (or demand). Most firms don’t host 10,000 concurrent users consistently, so the price investment of 10GbE would be questionable to them. A lot of our considerations with horizontal scaling is to save costs - use resources only when they’re needed. So indeed while a solution like AWS could prove to be more expensive when there’s a surge of participants, for the majority of the time when there are fewer meetings (and fewer participants), it’s more cost effective than a fixed, static $900 - $9,000 a month.
Yeah, it absolutely depends on the situation. We’re providing IaaS for Jitsi installations, rather than being a single installation, so we get economy of scale by taking dedicated servers and we get further economy of scale by making those servers larger according to our baseline load. Many/most individual Jitsi installations may well be better off on AWS (or our platform!) than getting dedicated servers since the fixed cost of dedicated servers is higher.