Kicking out members on big calls

Hi all!
Recently, probably after updating jitsi-meet to version 2.0.8044-1 and jitsi-videobridge2 to version 2.2-61-g98c9f868-1, we have a problem in calls with a large number of participants - this was in two calls with more than 200.
We have 5 bridges, octo is on. At the beginning of the call, when a large number of users have just connected, it kicks participants out of one bridge and they are redirected to other bridges. This can be seen from the graph.


On the s-rc-jvb-02 bridge, the number of participants drops sharply, on s-rc-jvb-01 it sharply increases, and on s-rc-jvb-03 you can see drops and a bounce.
At the moment of failure, the microphones of the reconnected participants are unmuted, although the “Everyone starts muted” setting was turned on by the moderator.
The rest of the time the call went through without a hitch. I don’t find any special errors in the logs, only messages about the timeout of participants, like these logs from s-rc-jvb-02:
JVB 2022-11-24 12:02:41.963 INFO: [34013] [confId=2da65b9f53922eed conf_name=sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com meeting_id=197fad47 epId=318e4095 stats_id=Kevon-Myi local_ufrag=7k5dj1gikdr8su ufrag=7k5dj1gikdr8su] ConnectivityCheckClient.processTimeout#881: timeout for pair: ~External gate IP~:16001/udp/srflx → ~Internal gate IP~:57420/udp/prflx (stream-318e4095.RTP), failing.
JVB 2022-11-24 12:02:43.508 INFO: [34013] [confId=2da65b9f53922eed conf_name=sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com meeting_id=197fad47 epId=ec04ad44 stats_id=Myrtle-Y5y local_ufrag=e1plt1gikdujl7 ufrag=e1plt1gikdujl7] ConnectivityCheckClient.processTimeout#881: timeout for pair: ~External gate IP~:16001/udp/srflx → ~Internal gate IP~:21740/udp/prflx (stream-ec04ad44.RTP), failing.

Here are the settings from jicofo.conf that might be important:
bridge {
max-bridge-participants=100
max-bridge-packet-rate = 50000
average-participant-stress = 0.01
stress-threshold = 0.8
failure-reset-threshold = 1 minute
selection-strategy=RegionBasedBridgeSelectionStrategy
health-checks {
enabled=true
interval = 10 seconds
retry-delay = 5 seconds
}
brewery-jid = “JvbBrewery@internal.auth.jitsi.example.com
}
local-region=“nsk”
octo {
enabled=true
id = 15
}

I understand that this data is not enough to understand the situation, but can you tell me what can be done to better find the problem.

Have you been monitoring the prosody process cpu usage?
What do you see in jicofo logs, why it moves participants?

Hey, @damencho

Yes, we monitoring all Jitsi infrastructure.
Here is a CPU Utilization graph of server with Jitsi core (prosody, jicofo, nginx, all jitsi-meet packages)


As you can see at problem moment (12.08) CPU usage is normal and much less then 20%

Prosody is a single-threaded process so you need to monitor it. Monitoring the overall CPU is very misleading. You can be hitting 100% on that core and still be on 25% on a 4 Core machine.

You need to verify if that is the case … you can switch and have 2 prosodies, one for the clients and one for the jvbs …

1 Like

Hey, @damencho
Thanks for the quick response.
Anton answered about the processor load. We will be watching this more closely.
In the Jicofo logs, I found a large number of similar messages:
Jicofo 2022-11-24 12:06:29.697 SEVERE: [612] [room=sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com meeting_id=197fad47-751f-4a7c-8a69-be5a330bf9fd bridge=jvb-03] Colibri2Session$sendRequest$2.invoke#277: Received error response for updateParticipant, session failed: Unknown endpoint e4fe33fe
Jicofo 2022-11-24 12:06:29.705 SEVERE: [630] [room=sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com meeting_id=197fad47-751f-4a7c-8a69-be5a330bf9fd] ColibriV2SessionManager.updateParticipant#463: No ParticipantInfo for e4fe33fe
Jicofo 2022-11-24 12:06:29.706 WARNING: [612] [room=sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com meeting_id=197fad47-751f-4a7c-8a69-be5a330bf9fd participant=9bac745b] ParticipantInviteRunnable.lambda$doRun$0#193: Failed to convert ContentPacketExtension to Media:
Jicofo 2022-11-24 12:06:29.765 WARNING: [622] [room=sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com meeting_id=197fad47-751f-4a7c-8a69-be5a330bf9fd participant=37e8f5ef] Participant.setInviteRunnable#221: Canceling ParticipantInviteRunnable[Participant[sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com/37e8f5ef]@2011638787]@1970859924
Jicofo 2022-11-24 12:06:29.766 WARNING: [37] [room=sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com meeting_id=197fad47-751f-4a7c-8a69-be5a330bf9fd participant=987dc829] Participant.sendQueuedRemoteSources#602: Can not signal remote sources, Jingle session not established.
Jicofo 2022-11-24 12:06:29.781 SEVERE: [622] [room=sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com meeting_id=197fad47-751f-4a7c-8a69-be5a330bf9fd] ColibriV2SessionManager.updateParticipant#463: No ParticipantInfo for 14c22d7c
Jicofo 2022-11-24 12:06:36.065 WARNING: [618] [room=sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com meeting_id=197fad47-751f-4a7c-8a69-be5a330bf9fd] JitsiMeetConferenceImpl.onSessionAcceptInternal#1275: No participant found for: sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com/37e8f5ef
Jicofo 2022-11-24 12:06:36.246 WARNING: [618] [room=sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com meeting_id=197fad47-751f-4a7c-8a69-be5a330bf9fd] JitsiMeetConferenceImpl.onTransportInfo#1122: Failed to process transport-info, no session for: sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com/37e8f5ef
Jicofo 2022-11-24 12:06:57.821 WARNING: [617] [room=sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com meeting_id=197fad47-751f-4a7c-8a69-be5a330bf9fd] JitsiMeetConferenceImpl.removeSources#1391: No sources or groups to be removed from e4fe33fe. The requested sources to remove: [audio=, video=, groups=]
Jicofo 2022-11-24 12:07:07.131 SEVERE: [631] [room=sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com meeting_id=197fad47-751f-4a7c-8a69-be5a330bf9fd] JitsiMeetConferenceImpl.onSessionAcceptInternal#1282: Reassigning jingle session for participant: sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com/e4fe33fe
Jicofo 2022-11-24 12:07:37.535 SEVERE: [665] [room=sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com meeting_id=197fad47-751f-4a7c-8a69-be5a330bf9fd bridge=jvb-02] Colibri2Session$sendRequest$2.invoke#277: Received error response for updateParticipant, session failed: Unknown endpoint 2fc1c14d
Jicofo 2022-11-24 12:09:17.284 WARNING: [617] [room=sarzherbsqpolu33k8idthrgcq6pgnhuke@conference.jitsi.example.com meeting_id=197fad47-751f-4a7c-8a69-be5a330bf9fd] JitsiMeetConferenceImpl.onMemberLeft#826: Participant not found for 9e886086. Terminated already or never started?

Do you see in the logs something about the faulty jvb?

Hello @damencho
Today the situation is repeated.
On one of the bridges - jvb-04, the call has completely failed. There were 64 participants in the meeting. This happened at 2022-12-23 10:12:15. After that, the call went to two other bridges jvb-02 and jvb-05.
During a call crash, there are many messages like this:

JVB 2022-12-23 10:15:59.763 INFO: [41858] [confId=1ee7dd34f646c811 conf_name=rc63a550d636be37ab0ea14f8a@conference.jitsi.example.com meeting_id=67ef1c4f epId=bf0375ae stats_id=Berta-gCW] RtpReceiverImpl.tearDown#347: Tearing down
JVB 2022-12-23 10:15:59.763 INFO: [41593] [confId=1ee7dd34f646c811 conf_name=rc63a550d636be37ab0ea14f8a@conference.jitsi.example.com meeting_id=67ef1c4f epId=21c9c6b2 stats_id=Burnice-HNx local_ufrag=ev1du1gkut450t ufrag=ev1du1gkut450t name=stream-21c9c6b2 componentId=1] MergingDatagramSocket$SocketContainer.runInReaderThread#770: Failed to receive: java.net.SocketException: Socket closed

Before that, there were messages like this:

JVB 2022-12-23 10:15:05.989 WARNING: [41877] [confId=1ee7dd34f646c811 conf_name=rc63a550d636be37ab0ea14f8a@conference.jitsi.example.com meeting_id=67ef1c4f epId=5d3961fd stats_id=Emmanuel-KPJ] EndpointMessageTransport.endpointMessage#544: Unable to find endpoint to send EndpointMessage to: 014c097c
JVB 2022-12-23 10:15:06.765 INFO: [41387] [confId=1ee7dd34f646c811 conf_name=rc63a550d636be37ab0ea14f8a@conference.jitsi.example.com meeting_id=67ef1c4f epId=fee8d3a2 stats_id=Corbin-Hd8 local_ufrag=dvepf1gkut45ai ufrag=dvepf1gkut45ai] ConnectivityCheckClient.processTimeout#881: timeout for pair: 10.10.106.183:16003/udp/host -> 10.22.31.126:53498/udp/prflx (stream-fee8d3a2.RTP), failing.
JVB 2022-12-23 10:15:06.914 INFO: [41387] [confId=1ee7dd34f646c811 conf_name=rc63a550d636be37ab0ea14f8a@conference.jitsi.example.com meeting_id=67ef1c4f epId=cef177ae stats_id=Lenore-Zgh local_ufrag=ec3dl1gkut1d5k ufrag=ec3dl1gkut1d5k] ConnectivityCheckClient.processTimeout#881: timeout for pair: 88.88.56.148:16003/udp/srflx -> 10.10.106.101:38162/udp/prflx (stream-cef177ae.RTP), failing.
JVB 2022-12-23 10:15:07.124 WARNING: [41886] [confId=1ee7dd34f646c811 conf_name=rc63a550d636be37ab0ea14f8a@conference.jitsi.example.com meeting_id=67ef1c4f epId=65ebdba9 stats_id=Marjolaine-noR] EndpointMessageTransport.endpointMessage#544: Unable to find endpoint to send EndpointMessage to: 014c097c
JVB 2022-12-23 10:15:07.996 INFO: [17] [confId=1ee7dd34f646c811 conf_name=rc63a550d636be37ab0ea14f8a@conference.jitsi.example.com meeting_id=67ef1c4f] EndpointConnectionStatusMonitor.monitorEndpointActivity#113: cef177ae has reconnected
JVB 2022-12-23 10:15:09.051 WARNING: [41891] [confId=1ee7dd34f646c811 conf_name=rc63a550d636be37ab0ea14f8a@conference.jitsi.example.com meeting_id=67ef1c4f epId=4dae2b53 stats_id=Willie-1Xz] EndpointMessageTransport.endpointMessage#544: Unable to find endpoint to send EndpointMessage to: 014c097c

The full log file from the JVB-04 host is attached to the post. The problem is at the very end of the log file.
jvb-04.log (2.7 MB)

The issue may be related to the participants who have connection issues. If any of the participants in the meeting has no activity during the videobridge.entity-expiration.timeout, that endpoint is expired and when the participant re-joins, everyone in that meeting re-invited and a new jvb is selected for all participants.
With the PRs below, the issue has been solved and now the jvb is selected only for the participant who has connection issues.

issue:

merged PRs:

1 Like