Problem with videobridge (Health check failed)

Hi,

I have 8 videobridges on Ubuntu machines. Yesterday, I saw that one of them doesn’t work. I don’t have any participants and conferences. In Jicofo logs I notice this error :

Jicofo 2022-01-14 09:35:17.360 WARNING: [142] JvbDoctor$HealthCheckTask.doHealthCheck#284: Health check failed for: jvbbrewery@internal.auth.jitsi-domain/jitsi-jvb08b: error xmlns=‘jabber:client’ type=‘cancel’ service-unavailable xmlns=‘urn:ietf:params:xml:ns:xmpp-stanzas’/ /error

Regards,
Norbert

Well, I don’t know in spite of looking at the code; what it seems to imply is that the bridge don’t know how to answer to a health status request - not that the status is bad, but it jjust don’t know about it. Which is strange. Could you examine the setup of this bridge - not the config files, but the general installation - Java packets, system, is there anything specific about this bridge ?

Check jvb logs

I restarted JVB and attached logs.
JVB_Logs.txt (8.8 KB)

Yep, this is a bridge with no errors in it. After restarting it jicofo still do not see it?

Yes, I restarted prosody / jicofo server and I have the same problem. Even I stop jitsi-videobridge2 service or shutdown this videobridge machine then I get the same error in Jicofo logs every 10 second.

Probably jicofo is not looking at the correct room for jvbs. As the jvb connected successfully:

JVB 2022-01-14 14:34:47.513 INFO: [21] [hostname=domain id=shard] MucClient$MucWrapper.join#748: Joined MUC: jvbbrewery@internal.auth.domain

Upload the jicofo log after restarting it.

I attached Jicofo logs.
Jicofo_Logs.txt (10.4 KB)

What are the command line args passed to jvb? From /etc/jitsi/videobridge/config

Maybe show and your jvb.conf and sip-communicator.properties files from that folder.

  1. My configuration on prosody/jicofo and videobridge is correct.
  2. Yesterday, I changed my problematic videobridge name in Jitsi cofiguration on prosody, videobridge and nginx machines from “jitsi-jvb08b” to “jitsi-jvb10b”. After that, my videobridge has started working.
  3. It’s strange because all the time I have in Jicofo.log the same error every 10 seconds. Prosody / Jicofo machine all the time try to do health check on videobridge “jitsi-jvb08b” but I don’t have configured this bridge. I attached Jicofo.log.

Jicofo_Log.txt (2.1 KB)

@damencho Do you know what is wrong ?

Try restarting prosody. Which prosody version is that?

I restarted prosody server many times. My prosody and jicofo versions :

jitsi-meet-prosody/stable,now 1.0.5675-1 all [installed]
prosody-modules/focal,now 0.0~hg20200128.09e7e880e056+dfsg-1 all [installed]
prosody/focal,now 0.11.4-1 amd64 [installed]
jicofo/stable,now 1.0-832-1 all [installed]

It seems that prosody stores information about unconfigured videobridge and tries to do his health check all the time.

It looks like you are running previous stable. It’s best to ask for help when running at least current stable. Just saying.

This said, prosody is not doing a health check of Jvb. Jicofo is sending a health check to the bridges it knows and it goes via prosody (xmpp protocol).

To check on these exchanges, a way is to enable full logging on xmpp packets at the Jicofo level.

*** warning *** this is to be made with care especially in production *** you risk to impact performance if your disk is slow, and to fill it up if it’s low on space.

To do this, you have to edit logging.properties in the /etc/jitsi/jicofo directory (and restart jicofo)

The relevant part is this:

handlers= java.util.logging.ConsoleHandler

# Handlers with XMPP debug enabled:
#handlers= java.util.logging.ConsoleHandler, org.jitsi.impl.protocol.xmpp.log.XmppPacketsFileHandler

COMMENT the line

handlers= java.util.logging.ConsoleHandler

by prefixing it with #

UNCOMMENT the line

#handlers= java.util.logging.ConsoleHandler, org.jitsi.impl.protocol.xmpp.log.XmppPacketsFileHandler

by removing the # at the beginning

The specific config is done a few lines after that:

# To enable XMPP packets logging add XmppPacketsFileHandler to the handlers property
org.jitsi.impl.protocol.xmpp.log.PacketDebugger.level=ALL
org.jitsi.impl.protocol.xmpp.log.XmppPacketsFileHandler.pattern=/var/log/jitsi/jicofo-xmpp.log
org.jitsi.impl.protocol.xmpp.log.XmppPacketsFileHandler.append=true
org.jitsi.impl.protocol.xmpp.log.XmppPacketsFileHandler.limit=200000000
org.jitsi.impl.protocol.xmpp.log.XmppPacketsFileHandler.count=3

It limits the used disk space to 3 files of 200 MB. Adjust if this is too much for your system.

After that, you should see the exchanges between jicofo and jvb in the /var/log/jitsi/jicofo-xmpp.log file.
The concerned lines are of type ‘iq’, are sent to jvbbrewery@internal.auth.yourdomain/MUC_NICKNAME of the bridge. Warning, you’ll see also the self health checks of Jicofo itself and the ‘presence’ messages - these ones are also interesting because they are signaling to jicofo that a bridge exists (Here too you will not see the JVB names, only the MUC_NICKNAME, you will have to do the matching yourself)

Hope this helps.

I enabled these packets logging and collected logs. I attached it.
Jicofo-XMPP-Logs.txt (5.4 KB)

Ah yes, I did not notice that you are not using UUID for the muc nicks. Not using UUID make having duplicates values more likely IMO.
As you did not show your configuration files I can only guess that you are using bridge-id == muc_nickname for all bridges. I think that if you are renaming a bridge it’s prudent to restart the whole server to ensure that everything is synchronized.
Oh wait. oh wait. Did you see in your log for the presence of your bridge:

name=‘current_timestamp’ value=‘2022-01-11 16:26:50.318’

Err, it’s a bit behind the times…

On each videobridges in sip-communicator file I have configured :

org.jitsi.videobridge.xmpp.user.shard.MUC_NICKNAME=<VC_NAME>

I changed problematic videobridge name and everything work properly but I have still errors in Jicofo.log with old videobridge name.

can you setup ntp on your problematic bridge ?

I have properly date and time on my Jitsi servers :

image

according to the doc, current_timestamp is not the date of the server when it received the request, it’s the date of the statistics generation. What this means is that your statistics are not generated since 2022-01-11, that is, 9 days. That’s an anomaly, and probably the reason for which your bridge replies that data is not available. Did you actually restart the whole bridge after changing the name ? Or did you just restart the service ? Maybe there is a rogue process remaining in memory.