Server scaling/tuning advice

We are trying to run our own Jitsi instance on a 8vCPU, 8GB RAM vm on an ESXi server we own. The server is running Ubuntu Bionic Beaver and should have plenty of bandwidth as far as our end goes. We do not know what the quality of the providers uplink is, maybe there are issues there?

The install went smooth and I added some monitoring to try and know what’s going on. Everything is on this single server. Yesterday we switched our Moodle install over from to our own Jitsi and today our pupils started videoconferencing over our own install. As soon as this happened, people started complaining about audio and video not working properly for all participants.

I’m concluding that there is something wrong with my setup here, but I do not know where to look. I still find it hard to know which jitsi component does what and how I can scale or optimize. Can someone with more knowledge please advise?

Here are some stats, I can produce more:

How many people do you wish to use your setup ? What is your effective bandwidth ( ?

It would be nice if we could serve around 500 users, in rooms of maximum 20-25 users.

This is our speedtest:

Very interestings graphics! Do you have stats on RAM usage ?


Have you checked your logs? How many people were connected when this started to happen? Does it already happen with a low amount of users?

I suppose that graphic is showing the server when idle. Even then, 8GB is probably the minimum when going > 50 users. I don’t have enough experience to provide precise advise, but if you had data on the moment it started to misbehave, then maybe this could be narrowed down to lack of memory.

Nothing in these graphs is jumping out. What sort of clients are your users using? Firefox? Chrome? Do you have a graph of the load on the JVB?

Most of our users are using Chrome.
CPU load in the graphs is that of the JVB server.
I don’t know exactly when the problems started. The logs are quite large (60MB) and I do not know what to look for.

I’m pasting in some snippets, maybe someone else finds something that stands out?

2020-04-30 07:41:54.284 WARNING: [8639] [confId=5d247f53f913a75a epId=4416dc5e gid=ff133c stats_id=Athena-Cbf] AbstractEndpointMessageTransport.onClientEndpointMessage#219: Unable to find endpoint b6deaf80 to send EndpointMessage

2020-04-30 07:41:54.363 INFO: [12407] [confId=b6ab4cfe46698bee gid=fff38c stats_id=Deanna-BrI conf_name=rmvpneu5mt.593.groepsgesprek ufrag=fl0p1e74vqeef epId=204aaaff local_ufrag=fl0p1e74vqeef] ConnectivityCheckClient.processTimeout#857: timeout for pair: -> (stream-204aaaff.RTP), failing.

2020-04-30 07:41:55.973 WARNING: [316] [confId=b6ab4cfe46698bee epId=204aaaff gid=fff38c stats_id=Deanna-BrI conf_name=rmvpneu5mt.593.groepsgesprek] TransportCcEngine.tccReceived#157: TCC packet contained received sequence numbers: 25941-25943. Couldn’t find packet detail for the seq nums: 25941-25943. Latest seqNum was 28024, size is 1000. Latest RTT is 23184.008301 ms.

2020-04-30 07:41:55.975 WARNING: [316] [confId=b6ab4cfe46698bee epId=204aaaff gid=fff38c stats_id=Deanna-BrI conf_name=rmvpneu5mt.593.groepsgesprek] EndpointConnectionStats.processReportBlock#115: Suspiciously high rtt value: 26109.996216 ms, remote processing delay was PT0.882003784S (57803), srSentTime was 2020-04-30T07:41:28.983Z, received time was 2020-04-30T07:41:55.975Z

I can put the logfile somewhere public, but am unsure of the security implication so hesitant to do this for now.

In the long run, I think I’ll let my users connect to because it seems that running a stable jitsi setup is not a walk in the park? It would be nice to know how things scale and what kind of setup, in theory, we need for our user base (500 users spread over 20-50 conferences ).

Just to be clear, we have at least 800/800 mbit/s bandwidth and are looking to serve 500 users spread over 20-50 conferences.