Jicofo failing to reconnect to prosody

Hello,

Currently I am working on a jitsi-meet deployment on Kubernetes, which is based on docker-jitsi-meet. I got everything working alright. But there is one issue:

Description

When restarting Prosody and jicofo keeps running, jicofo fails to connect the focus component. The other stream (I think to internal-muc) stays alive and starts working as soon as prosody is back online. Jicofo keeps getting presence updates from the jvb.

It seems that after a while (like 30min or so) jicofo finally manages to to connect the component on a new xmpp stream (see logs below)

When restarting jicofo together with prosody, everything works fine right away.

Questions

  1. I just don’t understand why jicofo first successfully discovers the component and once it wants to talk to it, it gets told that it is unavailable. I would be glad if someone could shed some light on this
  2. Is there a way to reduce the time for jicofo finally connecting the focus component on a new stream? I mean jvb also manages to do so although it is using the same ComponentBase implementation

Basic setup

  • The jitsi-meet components (jvb, prosody, jicofo, web) all run in their own pod
  • I don’t think that network setup is important here
  • used versions:
    • jicofo: package version 474-1 (nightly build from like two weeks ago)
    • prosody: 0.11.2

If you are really into it, you can use the docker images I am using from kt-docker-hub

Logs

at least the interesting sections imo

jicofo

# jicofo discovering focus.xmpp and then receiving the service unavailable error
Jicofo 2019-07-18 15:51:12.282 FINER: [38] org.jitsi.impl.protocol.xmpp.XmppProtocolProvider.processStanza().822 --> <iq to='xmpp' id='kU1pH-715' type='get'><query xmlns='http://jabber.org/protocol/disco#items'></query></iq>
Jicofo 2019-07-18 15:51:12.283 FINER: [38] org.jitsi.impl.protocol.xmpp.XmppProtocolProvider.processStanza().822 <-- <iq to='focus@auth-xmpp/focus1834169346338266' from='xmpp' id='kU1pH-715' type='result'><query xmlns='http://jabber.org/protocol/disco#items'><item jid='focus.xmpp'/></query></iq>
Jicofo 2019-07-18 15:51:12.283 FINE: [72] org.jitsi.impl.protocol.xmpp.XmppProtocolProvider.discoverItems().574 HAVE Discovered items for: xmpp
Jicofo 2019-07-18 15:51:12.283 FINE: [72] org.jitsi.impl.protocol.xmpp.XmppProtocolProvider.discoverItems().581 <item jid='focus.xmpp'/>
Jicofo 2019-07-18 15:51:12.283 FINER: [38] org.jitsi.impl.protocol.xmpp.XmppProtocolProvider.processStanza().822 --> <iq to='focus.xmpp' id='kU1pH-717' type='get'><query xmlns='http://jabber.org/protocol/disco#info'></query></iq>
Jicofo 2019-07-18 15:51:12.284 FINER: [38] org.jitsi.impl.protocol.xmpp.XmppProtocolProvider.processStanza().822 <-- <iq to='focus@auth-xmpp/focus1834169346338266' from='focus.xmpp' id='kU1pH-717' type='error'><error type='wait'><service-unavailable xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/><text xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'>Component unavailable</text></error></iq>
Jicofo 2019-07-18 15:51:12.284 FINE: [72] org.jitsi.impl.protocol.xmpp.XmppProtocolProvider.getEntityFeatures().607 Error getting feature list: XMPP error reply received from focus.xmpp: XMPPError: service-unavailable - wait

and here jicofo communicating on a new stream later on:

#just some ping
Jicofo 2019-07-18 16:52:09.386 FINE: [88] org.jitsi.xmpp.component.ComponentBase.run().575 Sending ping IQ: <iq to='xmpp' from='focus.xmpp' id='kU1pH-3396' type='get'><ping xmlns='urn:xmpp:ping'/></iq>
Jicofo 2019-07-18 16:52:09.388 FINE: [120] org.jitsi.jicofo.xmpp.FocusComponent.processIQ() (serving component 'Jitsi Meet Focus') Processing IQ (packetId kU1pH-3396): <iq from="xmpp" type="result" to="focus.xmpp" id="kU1pH-3396"/>

prosody

Note the bouncing errors and after quite some time the focus component connects:

# I started where jvb authenticated
c2s556ad4e6b330       info      Authenticated as jvb@auth-xmpp
focus.xmpp:component  warn      Component not connected, bouncing error for: <iq from='focus@auth-xmpp/focus1834169346338266' type='get' to='focus.xmpp' id='kU1pH-99'>
focus.xmpp:component  warn      Component not connected, bouncing error for: <iq from='focus@auth-xmpp/focus1834169346338266' type='get' to='focus.xmpp' id='kU1pH-123'>
#... bouncing error multiple times (~30x)
jcp556ad5108ad0       info      Incoming Jabber component connection
focus.xmpp:component  info      External component successfully authenticated

all i could find in the prosody debug log:

Jul 18 15:51:12 c2s556ad4e0d2e0 debug   Received[c2s]: <iq type='get' to='xmpp' id='kU1pH-715'>
Jul 18 15:51:12 c2s556ad4e0d2e0 debug   Sending[c2s]: <iq from='xmpp' type='result' to='focus@auth-xmpp/focus1834169346338266' id='kU1pH-715'>
Jul 18 15:51:12 c2s556ad4e0d2e0 debug   Received[c2s]: <iq type='get' to='focus.xmpp' id='kU1pH-717'>
Jul 18 15:51:12 focus.xmpp:component    warn    Component not connected, bouncing error for: <iq from='focus@auth-xmpp/focus1834169346338266' type='get' to='focus.xmpp' id='kU1pH-717'>
Jul 18 15:51:12 c2s556ad4e0d2e0 debug   Sending[c2s]: <iq from='focus.xmpp' type='error' to='focus@auth-xmpp/focus1834169346338266' id='kU1pH-717'>

thanks in advance
Kevin

Hi Kevin,

Question 1 is simple: prosody always lists all of the components listed in its configuration, whether they are connected or not. The XMPP client library (Smack) gets the list and sends a disco#info request to each one. When the focus.xmpp component is not connected, prosody returns an error.

I don’t have an answer for question 2. It is supposed to reconnect quickly (though I don’t know what the exact timeouts are), and in my test environment it does so in a few seconds.

Boris

Hello,

Thank you very much for your reply. I will look further into this. Maybe I am able to find out what’s wrong with the component connection.

Kevin

Maybe this is helpful for you but I didn’t had time to test it yet:

Thank you, I was aware of your issue. I am running this on Kubernetes. I played with it a bit but sometimes it took up to 20 mins for Jicofo to reconnect, always several minutes. This is way beyound any delay that can be explained by ping intervals.

My gut feeling tells me that jicofo does not resolve the xmpp server dns name once it is started but I was not able to verify that.

I was unable to identify the root cause and ended up building a small liveness-probe in Go which runs alongside the Jicofo application and probes the xmpp server whether the focus component is connected. If this is not the case, it returns unsuccesful and tells k8s that the jicofo container is not healthy anymore and gets restarted which reconnects it properly. Unfortunately you will need k8s or similar to get such a behaviour implemented.

This is basically a dirty hack which is why I dont have the liveness-probe itself publicly available.

I’m also using Kubernetes and it really seems like you have exactly the same problem. Thank you for your insights. I think we have to look a bit deeper to find the root cause.