Strange JVB behaviour

Hello, my setup is like this:
JVB version: 2.1-304-g8488f77d-1
Lua version: 5.2
LuaRocks version: 3.5.0
Prosody version: 0.11.7

Prosody use BOSH, JVB use WebSockets.

Today from 3000+ rooms (on 150+ servers) in 3-4 rooms (each on different server) I receive following thing in logs:

JVB LOG:

2021-01-04 10:36:28.293 SEVERE: [6932] XmppCommon.handleIQRequest#177: Exception handling IQ request
java.lang.NullPointerException
at org.jitsi.videobridge.shim.ChannelShim.(ChannelShim.java:163)
at org.jitsi.videobridge.shim.ContentShim.createRtpChannel(ContentShim.java:135)
at org.jitsi.videobridge.shim.ContentShim.getOrCreateChannelShim(ContentShim.java:276)
at org.jitsi.videobridge.shim.VideobridgeShim.processChannels(VideobridgeShim.java:141)
at org.jitsi.videobridge.shim.VideobridgeShim.handleColibriConferenceIQ(VideobridgeShim.java:313)
at org.jitsi.videobridge.Videobridge.handleColibriConferenceIQ(Videobridge.java:427)
at org.jitsi.videobridge.xmpp.XmppCommon.handleIQRequest(XmppCommon.java:158)
at org.jitsi.videobridge.xmpp.XmppCommon.handleIQInternal(XmppCommon.java:116)
at org.jitsi.videobridge.xmpp.XmppCommon.handleIQ(XmppCommon.java:87)
at org.jitsi.videobridge.xmpp.ClientConnectionImpl.handleIq(ClientConnectionImpl.java:97)
at org.jitsi.xmpp.mucclient.IQListener.handleIq(IQListener.java:50)
at org.jitsi.xmpp.mucclient.MucClient.handleIq(MucClient.java:566)
at org.jitsi.xmpp.mucclient.MucClient.access$700(MucClient.java:50)
at org.jitsi.xmpp.mucclient.MucClient$2.handleIQRequest(MucClient.java:530)
at org.jivesoftware.smack.AbstractXMPPConnection$4.run(AbstractXMPPConnection.java:1188)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
at java.base/java.lang.Thread.run(Thread.java:832)

After that I receive for all participants in the room:

2021-01-04 10:36:29.656 INFO: [6180] [confId=8e89e4811ed32281 epId=82e05de4 gid=33363 stats_id=Amelia-zQF conf_name=XXXXXXXX] TlsServerImpl.notifyAlertReceived#239: close_notify received, connection closing

Jicofo Log says (1 ms after JVB ):

Jicofo 2021-01-04 10:36:28.294 SEVERE: [718] org.jitsi.jicofo.AbstractChannelAllocator.log() jvbbrewery@HOST/1c7441de-eaf8-4c91-a79e-1d9f7233ef4f - failed to allocate channels, will consider the bridge faulty: XMPP error:
org.jitsi.protocol.xmpp.colibri.exception.ColibriException: XMPP error:
at org.jitsi.impl.protocol.xmpp.colibri.ColibriConferenceImpl.maybeThrowOperationFailed(ColibriConferenceImpl.java:378)
at org.jitsi.impl.protocol.xmpp.colibri.ColibriConferenceImpl.createColibriChannels(ColibriConferenceImpl.java:282)
at org.jitsi.protocol.xmpp.colibri.ColibriConference.createColibriChannels(ColibriConference.java:112)
at org.jitsi.jicofo.ParticipantChannelAllocator.doAllocateChannels(ParticipantChannelAllocator.java:111)
at org.jitsi.jicofo.AbstractChannelAllocator.allocateChannels(AbstractChannelAllocator.java:271)
at org.jitsi.jicofo.AbstractChannelAllocator.doRun(AbstractChannelAllocator.java:190)
at org.jitsi.jicofo.AbstractChannelAllocator.run(AbstractChannelAllocator.java:150)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
at java.base/java.lang.Thread.run(Thread.java:832)
Jicofo 2021-01-04 10:36:28.294 SEVERE: [718] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() One of our bridges failed: jvbbrewery@internal.HOST/1c7441de-eaf8-4c91-a79e-1d9f7233ef4f

And after that a lot of :

Jicofo 2021-01-04 10:36:28.296 WARNING: [718] org.jitsi.jicofo.bridge.BridgeSelectionStrategy.log() Failed to select initial bridge for participantRegion=null
Jicofo 2021-01-04 10:36:28.296 SEVERE: [718] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() Can not invite participant – no bridge available.

Sometimes thigs are “fixed” by itself, sometimes this is recurring every 5-10 minutes.

How many signaling nodes do you have for these numbers? Are you monitoring the prosody process CPU usage and what does it reports when the problem occurs?

Hey, @damencho, thanks for reaching out!

We’re using 1 machine setup with everything on it (nginx/prosody/JVB/JM/Jicofo) and etc. Each machine has different subdomain and we route traffic to it.

We’ve got Prosody on each machine and there is no strange behaviour with cpu/network/ram or whatever.

It happened on machine with 2 rooms; 44 participants in total; 3 participants sending audio and 2 participants sending video. The CPU was ~10%.
It happened on more loaded machine (I can find the log, if you want).

I look forward hearing from you.
Thanks

Prosody is single threaded, so you need to monitor just that process CPU usage.

But if you say you saw this on a not loaded machine … then it is not that …
So it is happening on a machine where everything is installed, it has just one jvb on the same machine?

Aaa there is a NPE in the bridge … I have missed that, sorry. Hum I see your bridge is old as latest one is 2.1-415-gbc53883e, maybe that was fixed at some point. We will update stable soon, so you better update and if you still see the issue report it again.

Hey, thanks for update.

Yes, everything is one 1 machine with 1 JVB there.

We’re intentionally using old videobridge, because with the last one some functionalities are not working.
We’re using lastN = 2;
When we’re using this version we can click on thumbnail of a user with disabled videostream and check what he’s doing.
When we have “fresh” install with latest version this is not possible (neither with the ones created after our version).

If you think this would be fixed soon it would be great!

Thanks

I’m not sure what is the problem you are referring to? Can you give more details?

Hi @damencho I am also using the same version of videobridge. It is installed in AWS server. It was running perfectly, but today we have started the AWS server as it was down and started a meeting with 5-6 person. Suddenly the videobrigde got crashed and after a minute it got restarted. I have checked jvb.log. I am getting the same NPE in jvb.log.

Could you please help me to understand what is the reason for getting this NPE. The server load was not much. It was less than 10%.
Is there any other solution to resolve this issue instead of upgrading the jvb version.

Thanks in advance for your cooperation.