Shutting down a JVB crashes a conference on another JVB

On a clean Jitsi installation with additional JVBs connected, whenever I kill a JVB that is serving a conference, conferences on other JVBs seem to crash as well.

I have reproduced this issue on a vanilla Jitsi installation, installed on a clean AWS instance from Jitsi APT repo:

jitsi-videobridge2=2.1-416-g2f43d1b4-1
jicofo=1.0-692-hf-1
jitsi-meet-prosody=1.0.4628-1
jitsi-meet-turnserver=1.0.4628-1
jitsi-meet-web-config=1.0.4628-1
jitsi-meet-web=1.0.4628-1
jitsi-meet-web-config=1.0.4628-1

Installation steps I took:

  • Installed all packages above except JVB on one AWS instance
  • Installed JVB on 3 other instances
  • Set up a domain name for this deployment: vanilla.OUR-DOMAIN
  • Configured that domain name on all nodes, set up JVBs to talk with the main node (settings in sip-communicator.properties on JVB nodes)
  • Disabled P2P in config.js on main node

Reproducing the issue:

  • Opened 4 browser windows
  • Navigated to vanilla.OUR-DOMAIN/conf-01 on 2 of them (conf-01 got scheduled on jvb-03)
  • Navigated to vanilla.OUR-DOMAIN/conf-02 on the other 2 (conf-02 got scheduled on jvb-02)
  • Stopped machine hosting jvb-03 (== conf-01)

Result:

  • conf-01 was rescheduled on jvb-01, both browser windows reconnected quickly and the conference continued normally
  • conf-02 - unrelated conference on an unrelated JVB - died, a box with loading bar showed up
  • after a while, conf-02 “reconnected” rescheduled on jvb-01 - browsers showed the conference tiles, but there was no video
  • jvb-01 started throwing this error rapidly: 2021-03-18 11:13:53.617 WARNING: [136] [confId=6ebde3a70e676229 epId=fc267c55 gid=64394 stats_id=Verner-Fgl conf_name=conf-01@conference.vanilla.OUR-DOMAIN] TransportCcEngine.tccReceived#170: TCC packet contained received sequence numbers: 2940-2949. Couldn't find packet detail for the seq nums: 2940-2949. Latest seqNum was 2935, size is 1000. Latest RTT is 35.062247 ms.

I’m also attaching logs from prosody, jvb-01 and jvb-02.

jicofo.log (27.5 KB)
jvb-01.log (1.6 MB)
jvb-02.log (172.6 KB)

The logs mention Octo which I haven’t enabled or set up in any way.

I’m testing some random things now with less reproductibility, and now whenever I kill an unused JVB, all conferences end up with the loading bar screen, one of them connects back to normal state, the other one has broken video. Logs still keep telling me this every time I restart a JVB: Using jvbbrewery@internal.auth.vanilla.OUR-DOMAIN/jvb-01 to allocate channels for: OctoParticipant[relays=[]]@1649640909

I can reproduce the same issue

I don’t know anything about it, but the reference.conf file shows this parameter as enabled by default. Maybe you could try to set it explicitly to false ?

 octo {
     enabled = false
}

jicofo.log

Jicofo 2021-03-18 18:44:46.672 INFO: [86] org.jitsi.jicofo.xmpp.BaseBrewery.log() Removed brewery instance: jvbbrewery@internal.auth.meet.mydomain.com/202e1d2d-d54c-44cb-b250-9f16b9912bc1    
Jicofo 2021-03-18 18:44:46.673 INFO: [86] org.jitsi.jicofo.xmpp.BaseBrewery.log() A bridge left the MUC: jvbbrewery@internal.auth.meet.mydomain.com/202e1d2d-d54c-44cb-b250-9f16b9912bc1       
Jicofo 2021-03-18 18:44:46.673 INFO: [86] org.jitsi.jicofo.bridge.BridgeSelector.log() Removing JVB: jvbbrewery@internal.auth.meet.mydomain.com/202e1d2d-d54c-44cb-b250-9f16b9912bc1           
Jicofo 2021-03-18 18:44:46.674 INFO: [86] org.jitsi.jicofo.bridge.JvbDoctor.log() Stopping health-check task for: jvbbrewery@internal.auth.meet.mydomain.com/202e1d2d-d54c-44cb-b250-9f16b9912b
c1                                                                                                                                                                                             
Jicofo 2021-03-18 18:44:46.712 INFO: [87] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() Creating an Octo participant for Bridge[jid=jvbbrewery@internal.auth.meet.mydomain.com/06c4f8a8-99e6-4
0aa-b99b-c1324c5f692e, relayId=null, region=null, stress=0.00] in JitsiMeetConferenceImpl[gid=37778, name=aaa-999@conference.meet.mydomain.com]                                                
Jicofo 2021-03-18 18:44:46.726 INFO: [197] org.jitsi.jicofo.AbstractChannelAllocator.log() Using jvbbrewery@internal.auth.meet.mydomain.com/06c4f8a8-99e6-40aa-b99b-c1324c5f692e to allocate ch
annels for: OctoParticipant[relays=[]]@1883266402                                                                                                                                              
Jicofo 2021-03-18 18:44:46.755 SEVERE: [197] org.jitsi.jicofo.AbstractChannelAllocator.log() jvbbrewery@internal.auth.meet.mydomain.com/06c4f8a8-99e6-40aa-b99b-c1324c5f692e - failed to alloc$
te channels, will consider the bridge faulty: XMPP error: <iq to='focus@auth.meet.mydomain.com/focus32404388774018' from='jvbbrewery@internal.auth.meet.mydomain.com/06c4f8a8-99e6-40aa-b99b-c$
324c5f692e' id='NwUl3-722' type='error'><error type='cancel'><internal-server-error xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/><text xmlns='urn:ietf:params:xml:ns:xmpp-stanzas' xml:lang='e$
'>Couldn&apos;t get OctoRelayService</text></error></iq>                                                                                                                                      
org.jitsi.protocol.xmpp.colibri.exception.ColibriException: XMPP error: <iq to='focus@auth.meet.mydomain.com/focus32404388774018' from='jvbbrewery@internal.auth.meet.mydomain.com/06c4f8a8-99$
6-40aa-b99b-c1324c5f692e' id='NwUl3-722' type='error'><error type='cancel'><internal-server-error xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/><text xmlns='urn:ietf:params:xml:ns:xmpp-stanza$
' xml:lang='en'>Couldn&apos;t get OctoRelayService</text></error></iq>                                                                                                                        
        at org.jitsi.impl.protocol.xmpp.colibri.ColibriConferenceImpl.maybeThrowOperationFailed(ColibriConferenceImpl.java:364)                                                                
        at org.jitsi.impl.protocol.xmpp.colibri.ColibriConferenceImpl.createColibriChannels(ColibriConferenceImpl.java:268)                                                                   
        at org.jitsi.jicofo.OctoChannelAllocator.doAllocateChannels(OctoChannelAllocator.java:95)                                                                                              
        at org.jitsi.jicofo.AbstractChannelAllocator.allocateChannels(AbstractChannelAllocator.java:271)         
        at org.jitsi.jicofo.AbstractChannelAllocator.doRun(AbstractChannelAllocator.java:190)                                                                                                  
        at org.jitsi.jicofo.AbstractChannelAllocator.run(AbstractChannelAllocator.java:150)                                                                                                    
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)                                                                                                   
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)                                                                                                                 
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)                                                                                          
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)                                                                                           
        at java.base/java.lang.Thread.run(Thread.java:834)                                                                                                                                    
Jicofo 2021-03-18 18:44:46.756 SEVERE: [197] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() One of our bridges failed: jvbbrewery@internal.auth.meet.mydomain.com/06c4f8a8-99e6-40aa-b99b-c132$
c5f692e                                                                                                                                                                                       
Jicofo 2021-03-18 18:44:46.756 INFO: [197] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() Region info, conference=37778: [[null, null, null]]                                                  
Jicofo 2021-03-18 18:44:46.757 INFO: [197] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() Region info, conference=37778: [[null, null]]
Jicofo 2021-03-18 18:44:46.758 INFO: [197] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() Region info, conference=37778: [[null]]                                                              
Jicofo 2021-03-18 18:44:46.761 INFO: [197] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() Region info, conference=37778: [[null]]                                   
Jicofo 2021-03-18 18:44:46.764 WARNING: [197] org.jitsi.jicofo.AbstractParticipant.log() Canceling OctoChannelAllocator[BridgeSession[id=37778_70fb2a, bridge=Bridge[jid=jvbbrewery@internal.a$
th.meet.mydomain.com/06c4f8a8-99e6-40aa-b99b-c1324c5f692e, relayId=null, region=null, stress=0.00]]@1828023164, OctoParticipant[relays=[]]@1883266402]@1533526376                              
Jicofo 2021-03-18 18:44:46.770 WARNING: [197] org.jitsi.jicofo.bridge.BridgeSelectionStrategy.log() Failed to select initial bridge for participantRegion=null                                
Jicofo 2021-03-18 18:44:46.771 SEVERE: [197] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() Can not invite participant -- no bridge available.                                                 
Jicofo 2021-03-18 18:44:46.772 SEVERE: [197] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() Failed to select a bridge for Participant[aaa-999@conference.meet.mydomain.com/062acaf5]@155244390$
Jicofo 2021-03-18 18:44:46.781 WARNING: [197] org.jitsi.jicofo.bridge.BridgeSelectionStrategy.log() Failed to select initial bridge for participantRegion=null                                 
Jicofo 2021-03-18 18:44:46.782 SEVERE: [197] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() Can not invite participant -- no bridge available.
Jicofo 2021-03-18 18:44:46.782 SEVERE: [197] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() Failed to select a bridge for Participant[aaa-999@conference.meet.mydomain.com/d6d6f07e]@274738738
Jicofo 2021-03-18 18:44:46.783 WARNING: [197] org.jitsi.jicofo.bridge.BridgeSelectionStrategy.log() Failed to select initial bridge for participantRegion=null
Jicofo 2021-03-18 18:44:46.783 SEVERE: [197] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() Can not invite participant -- no bridge available.
Jicofo 2021-03-18 18:44:46.784 SEVERE: [197] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() Failed to select a bridge for Participant[aaa-999@conference.meet.mydomain.com/b07c67fd]@246261953
Jicofo 2021-03-18 18:44:47.014 INFO: [86] org.jitsi.jicofo.ChatRoomRoleAndPresence.log() Chat room event ChatRoomMemberPresenceChangeEvent[type=MemberLeft sourceRoom=org.jitsi.impl.protocol.$
mpp.ChatRoomImpl@59a7804f member=ChatMember[aaa-999@conference.meet.mydomain.com/d6d6f07e, jid: zqjo9ppvmrlx6vqq@meet.mydomain.com/IpaiP2oi]@866479197]
Jicofo 2021-03-18 18:44:47.014 INFO: [86] org.jitsi.jicofo.ChatRoomRoleAndPresence.log() Owner has left the room !
Jicofo 2021-03-18 18:44:47.026 INFO: [86] org.jitsi.jicofo.ChatRoomRoleAndPresence.log() Granted owner to aaa-999@conference.meet.mydomain.com/062acaf5

jvb log which hosts the meeting

2021-03-18 18:44:30.966 INFO: [24] HealthChecker.run#170: Performed a successful health check in PT0.000014S. Sticky failure: false                                                           
2021-03-18 18:44:40.966 INFO: [24] HealthChecker.run#170: Performed a successful health check in PT0.000015S. Sticky failure: false                                                           
2021-03-18 18:44:46.740 WARNING: [36] [hostname=meet.mydomain.com id=shard] MucClient.handleIq#505: Exception processing IQ, returning internal server error. Request: IQ Stanza (conference http://jitsi.org/protocol/colibri) [to=jvb@auth.meet.mydomain.com/oM57allf,from=jvbbrewery@internal.auth.meet.mydomain.com/focus,id=anZiQGF1dGgubWVldC5teWRvbWFpbi5jb20vb001N2FsbGYATndVbDMtNzIyA
AYWrQggcTf+nO4rLTRLkEw=,type=set,]                                                                                                                                                            
java.lang.IllegalStateException: Couldn't get OctoRelayService                                                                                                                                
        at org.jitsi.videobridge.octo.ConfOctoTransport.<init>(ConfOctoTransport.java:150)                                                                                                    
        at org.jitsi.videobridge.octo.ConfOctoTransport.<init>(ConfOctoTransport.java:137)                                                                                                    
        at org.jitsi.videobridge.Conference.getTentacle(Conference.java:999)                                                                                                                  
        at org.jitsi.videobridge.shim.ConferenceShim.processOctoChannels(ConferenceShim.java:209)                                                                                             
        at org.jitsi.videobridge.shim.VideobridgeShim.handleColibriConferenceIQ(VideobridgeShim.java:361)                                                                                     
        at org.jitsi.videobridge.Videobridge.handleColibriConferenceIQ(Videobridge.java:363)                                                                                                  
        at org.jitsi.videobridge.Videobridge$XmppConnectionEventHandler.colibriConferenceIqReceived(Videobridge.java:627)                                                                     
        at org.jitsi.videobridge.xmpp.XmppConnection$handleIqRequest$4.invoke(XmppConnection.kt:179)                                                                                          
        at org.jitsi.videobridge.xmpp.XmppConnection$handleIqRequest$4.invoke(XmppConnection.kt:41)                                                                                           
        at org.jitsi.videobridge.xmpp.XmppConnection.measureDelay(XmppConnection.kt:198)                                                                                                      
        at org.jitsi.videobridge.xmpp.XmppConnection.handleIqRequest(XmppConnection.kt:178)                                                                                                   
        at org.jitsi.videobridge.xmpp.XmppConnection.handleIq(XmppConnection.kt:163)                                                                                                          
        at org.jitsi.xmpp.mucclient.IQListener.handleIq(IQListener.java:50)                                                                                                                   
        at org.jitsi.xmpp.mucclient.MucClient.handleIq(MucClient.java:501)                                                                                                                    
        at org.jitsi.xmpp.mucclient.MucClient.access$300(MucClient.java:50)                                                                                                                   
        at org.jitsi.xmpp.mucclient.MucClient$2.handleIQRequest(MucClient.java:466)                                                                                                           
        at org.jivesoftware.smack.AbstractXMPPConnection$4.run(AbstractXMPPConnection.java:1188)                                                                                              
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)                                                                                          
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)                                                                                          
        at java.base/java.lang.Thread.run(Thread.java:834)                                                                                                                                    
2021-03-18 18:44:47.038 INFO: [50] [confId=4536243afc8257a7 epId=d6d6f07e gid=37778 stats_id=Korbin-HQn conf_name=aaa-999@conference.meet.mydomain.com] TlsServerImpl.notifyAlertReceived#238:
close_notify received, connection closing                                                                                                                                                     
2021-03-18 18:44:47.907 INFO: [73] [confId=4536243afc8257a7 epId=062acaf5 gid=37778 stats_id=Korbin-HQn conf_name=aaa-999@conference.meet.mydomain.com] TlsServerImpl.notifyAlertReceived#238:
close_notify received, connection closing                                                                                                                                                     
2021-03-18 18:44:47.963 INFO: [76] [confId=4536243afc8257a7 epId=b07c67fd gid=37778 stats_id=Korbin-HQn conf_name=aaa-999@conference.meet.mydomain.com] TlsServerImpl.notifyAlertReceived#238:
close_notify received, connection closing                                                                                                                                                     
2021-03-18 18:44:49.266 INFO: [96] [confId=4536243afc8257a7 gid=37778 stats_id=Korbin-HQn conf_name=aaa-999@conference.meet.mydomain.com ufrag=b43e01f1301jfv epId=d6d6f07e local_ufrag=b43e01f1301jfv] ConnectivityCheckClient.processTimeout#860: timeout for pair: 172.17.17.22:10000/udp/srflx -> 192.168.1.100:56196/udp/prflx (stream-d6d6f07e.RTP), failing.                          
2021-03-18 18:44:50.591 INFO: [96] [confId=4536243afc8257a7 gid=37778 stats_id=Korbin-HQn conf_name=aaa-999@conference.meet.mydomain.com ufrag=egufo1f1301sab epId=b07c67fd local_ufrag=egufo1f
1301sab] ConnectivityCheckClient.processTimeout#860: timeout for pair: 172.17.17.22:10000/udp/srflx -> 192.168.1.100:39055/udp/host (stream-b07c67fd.RTP), failing.                           
2021-03-18 18:44:50.966 INFO: [24] HealthChecker.run#170: Performed a successful health check in PT0.000031S. Sticky failure: false
2021-03-18 18:44:52.267 INFO: [96] [confId=4536243afc8257a7 gid=37778 stats_id=Korbin-HQn conf_name=aaa-999@conference.meet.mydomain.com ufrag=b43e01f1301jfv epId=d6d6f07e local_ufrag=b43e01f
1301jfv] ConnectivityCheckClient.processTimeout#860: timeout for pair: 172.17.17.22:10000/udp/srflx -> 192.168.1.100:56196/udp/prflx (stream-d6d6f07e.RTP), failing.

We have the same issue in our deployment.
At first we thought the reason might be that octo is enabled per default in jicofo [1], but not in the videbridges [2], but even explicitely deactivating it in both components changed nothing in our case.

[1] jicofo/reference.conf at master · jitsi/jicofo · GitHub
[2] jitsi-videobridge/reference.conf at master · jitsi/jitsi-videobridge · GitHub

Can confirm - disabled octo in both places and nothing changed. Still getting Using jvbbrewery@internal.auth.OUR-DOMAIN/jvb-02 to allocate channels for: OctoParticipant[relays=[]]@607377124

1 Like

I just tested this on older Jitsi packages:

jitsi-meet-web=1.0.4466-1
jitsi-meet-web-config=1.0.4466-1
jitsi-videobridge2=2.1-376-g9f12bfe2-1
jicofo=1.0-644-1
jitsi-meet-prosody=1.0.4466-1
jitsi-meet-turnserver=1.0.4466-1

The issue was not present there. However for us, downgrading is not an option.

Additionally, the failover to a working JVB on these versions is nearly instant, as opposed to a couple of seconds on latest Jitsi packages.

1 Like

I’ve confirmed on my setup that the conference crashing started between jicofo versions 1.0-644-1 and 1.0-690-1.

A quick solution could be to downgrade jicofo only to 1.0-644-1

I posted the issue on jicofo GitHub as well.

1 Like

Just for the record, here’s the issue: All conferences on a multi-JVB setup crash when a single JVB goes down · Issue #707 · jitsi/jicofo · GitHub

Do you reproduce this with the latest jicofo (jicofo_1.0-726-1_all.deb at this time)?

Is there a deb package available for this version?

Yep, this is unstable repo.

https://download.jitsi.org/unstable/

No go - got 2 JVBs, one conference on each, restarted JVB1, the conference was rescheduled on JVB2, but the previous conference on JVB2 crashed.

Still getting this in logs: Jicofo 2021-03-23 08:19:10.324 INFO: [169] [room=test4@conference.vanilla.OUR-DOMAIN] AbstractChannelAllocator.allocateChannels#248: Using jvbbrewery@internal.auth.vanilla.OUR-DOMAIN/jvb-02 to allocate channels for: OctoParticipant[relays=[]]@1429035581

I had to upgrade jitsi-meet-prosody (to 1.0.4831-1), otherwise a conference would not even start. (“Something went wrong”)

However, there is an improvement in failover time - it is nearly instant as it was a version before I reported the issues.

This is with latest jicofo?

Yep, the one you suggested:

# apt-cache policy jicofo
jicofo:
  Installed: 1.0-726-1
  Candidate: 1.0-726-1
  Version table:
 *** 1.0-726-1 500
        500 https://download.jitsi.org unstable/ Packages
        100 /var/lib/dpkg/status

@Kordian_Kowalski could you please share your jvb.conf? You can redact usernames & passwords.

jvb.conf:

videobridge {
    http-servers {
        public {
            port = 9090
        }
    }
    websockets {
        enabled = true
        domain = "vanilla.OUR-DOMAIN:443"
        tls = true
        server-id = jvb-01
    }
}

sip-communicator.properties:

org.ice4j.ice.harvest.DISABLE_AWS_HARVESTER=true
org.ice4j.ice.harvest.STUN_MAPPING_HARVESTER_ADDRESSES=meet-jit-si-turnrelay.jitsi.net:443

org.jitsi.jicofo.ALWAYS_TRUST_MODE_ENABLED=true

org.jitsi.videobridge.ENABLE_REST_SHUTDOWN=true

org.jitsi.videobridge.ENABLE_STATISTICS=true
org.jitsi.videobridge.STATISTICS_TRANSPORT=muc,colibri,rest
org.jitsi.videobridge.STATISTICS_INTERVAL=5000

org.jitsi.videobridge.xmpp.user.shard.HOSTNAME=vanilla.OUR-DOMAIN
org.jitsi.videobridge.xmpp.user.shard.DOMAIN=auth.vanilla.OUR-DOMAIN
org.jitsi.videobridge.xmpp.user.shard.USERNAME=<username>
org.jitsi.videobridge.xmpp.user.shard.PASSWORD=<password>
org.jitsi.videobridge.xmpp.user.shard.MUC_JIDS=JvbBrewery@internal.auth.vanilla.OUR-DOMAIN
org.jitsi.videobridge.xmpp.user.shard.MUC_NICKNAME=jvb-01
org.jitsi.videobridge.xmpp.user.shard.DISABLE_CERTIFICATE_VERIFICATION=true

jvb-02 has only changed server-id and muc nickname.

nginx is set up to proxy websockets to those JVBs - that is working fine.

Thanks @Kordian_Kowalski I think what’s happening here is that Jicofo octo is enabled but jvb octo is disabled and these two need be in sync (our bad, the defaults are not in sync)… Could you please share you jicofo.conf as well to confirm ?

Sure, nothing interesting there I think:

jicofo.conf:

# Jicofo HOCON configuration. See /usr/share/jicofo/jicofo.jar/reference.conf for
#available options, syntax, and default values.
jicofo {
}

sip-communicator.properties:

org.jitsi.jicofo.BRIDGE_MUC=JvbBrewery@internal.auth.vanilla.OUR-DOMAIN