Large # of participants - the unanswered question

Hi all,
I’ve been on this forum for quite some time now and the question about the max number of participants always comes up with no one able to provide a definite answer.

What I learned so far is that has a hard limit of 75 users per participant while a self-hosted environment has no limitations and depends only on:

  1. Bridges bandwidth and resources
  2. Number of bridges
  3. Number of shards (1 Jicofo with multiple bridges)

We all know that the endpoints shouldn’t have issues handling when using functions like LastN (sending only a certain amount of stream to endpoints).

Also - since Jitsi works with big numbers at 180p resolutions which equals to ~100Kbps, a conference of 100 participants with video on will mean an endpoint downstream of 9.9Mbps which is very common (if you disagree with me on this one I suggest join the rest of us at 2021 - blunt but that’s my take on this).

I estimate you can have about 300 users per shard when all the bridges has a 1Gbps connection. I calculated this by 3Mbps upstream per user X 300 Users = ~900Mbps upstream per each bridge.

What do you think of these numbers and estimation?
I don’t see a real issue having more than 100 participants on a conference.
How do you think mobile devices will manage with only a handful of participants having videos on?

@damencho @Freddie @Boris_Grozev would love to know your take on this.


Hey @rn1984!

The critical issue in this instance is not even the available bandwidth, it’s more with the ability of client machines to keep up. For large-number meetings, using the status quo, they will struggle to render the UI. Even with LastN implemented, you still have the representative tiles/thumbnails.

That said, I know the team is actively working on making large meetings available in Jitsi.

P.S: People have reported success with over 100 participants in one meeting (I think the highest number I’ve seen reported is 150, if I recall clearly).

Before even reading I gotta say!!! The speed of you replying, WOW.

Thank you !

The UI?! why is that an issue? Would love to learn more about this one.

That being said, then all we need to do is split the tiles to pages like the Z company does?

Thumbnails require rendering; rendering requires resources (CPU, RAM). Remember that Jitsi is built on the SFU model so all of the media processing is done on the client’s machine. This is one of the biggest distinctions from Zoom, which uses MCU. So, with Jitsi, the server itself doesn’t do as much work, it just forwards the streams to the clients. Which means there’s minimal load on the server, but then the work is offloaded to the clients. Invariably, the more streams the client has to process, the more work it needs to do. Most client machines have average specs and that has to be the target when planning and deploying a platform that’s spec-dependent.

Wait but with LastN the endpoint get only a handful of streams to render and process and all the tiles are simply empty placeholders. So why would empty tiles that are simply a number of images that correlates with the number of “silent” participant matter?

Because they’re endpoints that still receive data, even if they’re receive-only endpoints. They still consume resources, they still process media; they can’t be infinite. There’s always a hard stop somewhere - whether it’s at 100 or 1,000, there’s a point where the client just cannot handle the relay anymore.

Hold on.
So from what you wrote here - because other endpoints receive data my computer will have a hard time processing the tiles. This doesn’t make much sense…
The only thing I can think of somehow relates to the signaling coming from Prosody via XMPP. But again - that only to show number of tiles that are essentially images which requires no processing.

Still baffled about this. Doesn’t make sense.
What do you think?

It doesn’t make sense to you because you’re thinking of it as a one-to-many relationship; it’s actually a many-to-many relationship. Your own computer will still process those tiles - they’re not static objects, they’re not merely just empty placeholders. Tiles are dynamic; they’re constantly getting resized and have to dynamically adapt to the number present, they display avatars, receive and connection signals e.t.c… And your computer has to process all that information for every single tile.


Got it. So pages it is. No tiles no issues, right?

Making them place holders that correlates with merely the number of silent participants sounds like a better architecture. I wonder why the Jitsi team didn’t set it up like that :thinking:

The past years we have been slowly working reaching high number of participants. Expect changes including UI changes in the following weeks/months.

1 Like

Thanks will be on the lookout.

Will share major breakthroughs if I’ll get to some myself.

Emil also talked about some changes on signalisation for big conference in last community call
Does it mean you will reduce signaling complexity with something like viewer only user ?

Yep, the signalling part is ready, I think. To be able to make page-nation, to tell the bridge show me these videos …

That’s my solution LOL

is this the same what I was thinking too?

I was thinking like, for 180 person conference I will divide it into 6 pages each having 30 persons (the highest load a client can handle at a time in both bandwidth and CPU consideration, we can make it variable number from client side) and at that time the client will see only those 30 persons. and the client can go to next page with another 30 person video (while this client will stop receiving prev 30 videos and start receiving 2nd page 30 person video from video bridge) and so on. so the client don’t need to downstream all the 179 streams always from video bridge and video bridge also don’t always need to upstream 180 streams to all. the client can just ask which pages or which set of persons (max 30) stream he wants and a video bridge can serve those. the client can also move one person video to another page to make the first page for only important or speakers person and other inactive or less significant speakers can stay only other pages. My only headache is, does the client or video bridge capable of smooth page changing (suppressing prev 30 streams and serve new 30 streams) so that it doesn’t hamper user experience…!!

I only came up with this solution for big meetings by considering jitsi architecture and I think it will be quite good but not sure though…! plz enlighten me if I am wrong in any technical ways in here.
Thanks in advance :heart:
@Freddie @rn1984 @damencho

Sounds about right. According to Damencho they’re working on it

Are you gonna also apply the streams request between the servers?
Otherwise the bandwidth bottle neck will be on the connections between the bridges and not between the client and the bridge/