Hi all!
Recently, probably after updating jitsi-meet to version 2.0.8044-1 and jitsi-videobridge2 to version 2.2-61-g98c9f868-1, we have a problem in calls with a large number of participants - this was in two calls with more than 200.
We have 5 bridges, octo is on. At the beginning of the call, when a large number of users have just connected, it kicks participants out of one bridge and they are redirected to other bridges. This can be seen from the graph.
On the s-rc-jvb-02 bridge, the number of participants drops sharply, on s-rc-jvb-01 it sharply increases, and on s-rc-jvb-03 you can see drops and a bounce.
At the moment of failure, the microphones of the reconnected participants are unmuted, although the “Everyone starts muted” setting was turned on by the moderator.
The rest of the time the call went through without a hitch. I don’t find any special errors in the logs, only messages about the timeout of participants, like these logs from s-rc-jvb-02:
JVB 2022-11-24 12:02:41.963 INFO: [34013] [confId=2da65b9f53922eed conf_name=sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com meeting_id=197fad47 epId=318e4095 stats_id=Kevon-Myi local_ufrag=7k5dj1gikdr8su ufrag=7k5dj1gikdr8su] ConnectivityCheckClient.processTimeout#881: timeout for pair: ~External gate IP~:16001/udp/srflx → ~Internal gate IP~:57420/udp/prflx (stream-318e4095.RTP), failing.
JVB 2022-11-24 12:02:43.508 INFO: [34013] [confId=2da65b9f53922eed conf_name=sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com meeting_id=197fad47 epId=ec04ad44 stats_id=Myrtle-Y5y local_ufrag=e1plt1gikdujl7 ufrag=e1plt1gikdujl7] ConnectivityCheckClient.processTimeout#881: timeout for pair: ~External gate IP~:16001/udp/srflx → ~Internal gate IP~:21740/udp/prflx (stream-ec04ad44.RTP), failing.
Here are the settings from jicofo.conf that might be important:
bridge {
max-bridge-participants=100
max-bridge-packet-rate = 50000
average-participant-stress = 0.01
stress-threshold = 0.8
failure-reset-threshold = 1 minute
selection-strategy=RegionBasedBridgeSelectionStrategy
health-checks {
enabled=true
interval = 10 seconds
retry-delay = 5 seconds
}
brewery-jid = “JvbBrewery@internal.auth.jitsi.example.com”
}
local-region=“nsk”
octo {
enabled=true
id = 15
}
I understand that this data is not enough to understand the situation, but can you tell me what can be done to better find the problem.