Jitsi Video Bridge crash: "failed to allocate channels, will consider the bridge faulty"

Hello everyone,

Earlier today I was having a large important meeting and about 2 hours into the call Jitsi crashed.
Everyone was immediately disconnected from the room, and if they tried to join again, it would give an error popup saying it will try to reconnect in 25 seconds. It eventually fixed itself in a couple of minutes, but by that time most people had stopped trying to connect.

I checked the Jicofo logs and found this error at the time that the crash happened:

Jicofo 2020-06-12 09:12:51.746 SEVERE: [211] org.jitsi.jicofo.AbstractChannelAllocator.log() jvbbrewery@internal.auth.my.domain.com/ac3d2484-147d-4bd9-845b-a359d561e247 - failed to allocate channels, will consider the bridge faulty: XMPP error: <iq to=‘focus@auth.my.domain.com/focus632648033603’ from=‘jvbbrewery@internal.auth.my.domain.com/ac3d2484-147d-4bd9-845b-a359d561e247’ id=‘Bcl4c-92616’ type=‘error’><error type=‘cancel’><internal-server-error xmlns=‘urn:ietf:params:xml:ns:xmpp-stanzas’/></error></iq>

Jicofo 2020-06-12 09:12:51.747 SEVERE: [211] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() One of our bridges failed: jvbbrewery@internal.auth.my.domain.com/ac3d2484-147d-4bd9-845b-a359d561e247

After these 2 errors, the log just had this same error over and over again:

org.jitsi.jicofo.JitsiMeetConferenceImpl.log() Can not invite participant – no bridge available.

Does anyone have an idea of what might have caused the issue? And maybe suggestions on how to prevent the problem in the future?

1 Like

We had this bad experience some days age.
In addition to the errors you mentioned, we got lots of these warnings in jvb.log before crash:

TransportCcEngine.tccReceived#157: TCC packet contained received sequence numbers: 26499-26519. Couldn't find packet detail for the seq nums: 26499-26519. Latest seqNum was 27792, size is 1000. Latest RTT is 64.966918 ms.

and finally a fatal SIGSEGV error and then jvb crashed:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f7987224207, pid=29882, tid=0x000074f38dddf700
#
# JRE version: OpenJDK Runtime Environment (8.0_252-b09) (build 1.8.0_252-8u252-b09-1~18.04-b09)
# Java VM: OpenJDK 64-Bit Server VM (25.252-b09 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libc.so.6+0x97207]  __libc_malloc+0x197
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid29892.log`

I’m curious what is the cause of this problem. and how we can prevent its recurrence.

1 Like

We also experienced “failed to allocate channels” while having crowd testing for more than 2hrs. We configured 2 JVB both having 4vCPU and 16GB mem (12GB assigned to JVB). At 9:17 there’s a SEVERE log in jicofo

and this is the only severe log in jvb

we noticed that all endpoints was re-assigned to the 2nd JVB after jicofo detected the 1st JVB is faulty.

@damencho @bbaldino
when does “failed to allocate channels” happen? or when do we allocate/de-allocate channels?
why the conference details in 1st JVB was not deleted when jicofo transferred all clients to 2nd JVB?

for me JVB1 still has enough memory when the issue occurred. (check 9:17)

our test scenario: all 28-30 participants share their screen, changing the layout (2x2, 3x3, 5x4, 7x6, full screen)
application implementation: every layout change we remove all the tracks and add the tracks of necessary partipants

do you think add/remove track cause leak or something?

What is the output for

grep rate /etc/prosody/* -R

@shaunferns as others have pointed out, your bridge has experienced some error. To identify the exact cause we would need to check the Jicofo and JVB log files.

I wonder if you’re using SCTP? If not (i.e. if you’re using Websockets), what’s you Linux Kernel version and JVB version? @Jonathan_Lennox recently fixed an issue where the JVM would crash due to GC of native objects.

It looks like either a network problem or maybe IQ rate limiting, I guess that’s why @emrah is asking you to check the prosody configuration.

@emrah

@gpolitis
what is this IQ rate limit? do we have some kind of document/reference about this for us to understand how it works (behavior)?

It’s a mechanism to prevent XMPP abuse from malicious parties. It doesn’t seem to be configured on your server. Maybe there really was a network outage between Jicofo and the bridge? This happens more often than you’d think… Are the two instances hosted in different clouds and/or regions?

@gpolitis
Both jicofo and jvb were deployed on GCP VM under asia-northeast1-a and asia-northeast1-b zone respectively.

Can you please send us a link about this IQ rate limit configuration?
We noticed that on our every crowd testing this issue occurred around after 2hrs, does it have a threshold limit?
Does health check affect behavior of jicofo/jvb? should we disable the health check?

If you don’t enable limits in your prosody config, there is no need to configure anything. It seems that limits is not enabled in your system

grep limits /etc/prosody -R

we didn’t configure it

Could you paste a bit more context? I remember I once saw an NPE causing this but the stacktrace is not visible from what you’ve pasted.

during this test we use SEVERE log level… what is this NPE and how do we get the stacktrace?

Hello Guys,

I have pretty the same issue … jvb crashed and were in a “try to recconect”-loop.
It was a Meeting with about 14 participants.
In jicofo.log I see a few SEVERE’s incl. a traceback:

Jicofo 2021-06-17 09:00:52.828 SEVERE: [531] [room=testingpowermeeting@conference.meeting.myserver.at] AbstractChannelAllocator.allocateChannels#299: jvbbrewery@internal.auth.meeting.myserver.at/cf198570-090d-4be8-a509-a137dc4a545a - failed to allocate channels, will consider the bridge faulty: Timed out waiting for a response.
org.jitsi.protocol.xmpp.colibri.exception.TimeoutException: Timed out waiting for a response.
        at org.jitsi.impl.protocol.xmpp.colibri.ColibriConferenceImpl.maybeThrowOperationFailed(ColibriConferenceImpl.java:312)
        at org.jitsi.impl.protocol.xmpp.colibri.ColibriConferenceImpl.createColibriChannels(ColibriConferenceImpl.java:252)
        at org.jitsi.protocol.xmpp.colibri.ColibriConference.createColibriChannels(ColibriConference.java:97)
        at org.jitsi.jicofo.ParticipantChannelAllocator.doAllocateChannels(ParticipantChannelAllocator.java:100)
        at org.jitsi.jicofo.AbstractChannelAllocator.allocateChannels(AbstractChannelAllocator.java:253)
        at org.jitsi.jicofo.AbstractChannelAllocator.doRun(AbstractChannelAllocator.java:172)
        at org.jitsi.jicofo.AbstractChannelAllocator.run(AbstractChannelAllocator.java:133)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Jicofo 2021-06-17 09:00:52.837 SEVERE: [531] [room=testingpowermeeting@conference.meeting.myserver.at] JitsiMeetConferenceImpl.lambda$onMultipleBridgesDown$14#2235: One of our bridges failed: jvbbrewery@internal.auth.meeting.oebv.at/cf198570-090d-4be8-a509-a137dc4a545a

Has anyone a idea on how to prevent jvb crashes … this was an important meeting and the customers changed to a zoom-meeting… -_-

What version are you running?

root@meeting:~# dpkg -l | grep -e “jitsi” -e “prosody”
ii jitsi-meet 2.0.5870-1 all WebRTC JavaScript video conferences
ii jitsi-meet-prosody 1.0.4985-1 all Prosody configuration for Jitsi Meet
ii jitsi-meet-turnserver 1.0.4985-1 all Configures coturn to be used with Jitsi Meet
ii jitsi-meet-web 1.0.4985-1 all WebRTC JavaScript video conferences
ii jitsi-meet-web-config 1.0.4985-1 all Configuration for web serving of Jitsi Meet
ii jitsi-videobridge2 2.1-492-g5edaf7dd-1 all WebRTC compatible Selective Forwarding Unit (SFU)
ii prosody 0.11.2-1+deb10u2 amd64 Lightweight Jabber/XMPP server
ii prosody-modules 0.0~hg20190203.b54e98d5c4a1+dfsg-1+deb10u1 all Selection of community modules for Prosody

You might want to upgrade to the latest stable version. There was a fix on this issue recently, if I remember correctly.

Thank you, I updated the Server lets see if the error occurs again.

Had same issue as this thread mentioned (JVB was about 4 weeks old), conference was crashing after 20 participants joined or so, was never consistent
Suggestion of the latest release has resolved it , thanks :slight_smile:

Glad to hear. :+1:t5: