Hi everyone,
thank you for all your work with Jitsi it’s a great thing and especially helps many people during COVID.
We see some issues recently with Prosody and have no idea left on what to do with it. From time to time Prosody just starts eating one CPU and stops responding to XMPP requests. And all users get disconnected because Jicofo also cannot talk to the bridges anymore.
There are no real logs on what leads to this situation just symptoms but I will paste them here:
Jicofo
Jicofo 2020-12-17 10:56:13.566 INFO: [3520] org.jitsi.jicofo.FocusManager.log() Exception while trying to start the conference
net.java.sip.communicator.service.protocol.OperationFailedException: Failed to join the room
at org.jitsi.impl.protocol.xmpp.ChatRoomImpl.joinAs(ChatRoomImpl.java:298)
at org.jitsi.impl.protocol.xmpp.ChatRoomImpl.join(ChatRoomImpl.java:209)
at org.jitsi.jicofo.JitsiMeetConferenceImpl.joinTheRoom(JitsiMeetConferenceImpl.java:581)
at org.jitsi.jicofo.JitsiMeetConferenceImpl.start(JitsiMeetConferenceImpl.java:404)
at org.jitsi.jicofo.FocusManager.conferenceRequest(FocusManager.java:465)
at org.jitsi.jicofo.FocusManager.conferenceRequest(FocusManager.java:419)
at org.jitsi.jicofo.FocusManager.conferenceRequest(FocusManager.java:394)
at org.jitsi.jicofo.xmpp.FocusComponent.handleConferenceIq(FocusComponent.java:337)
at org.jitsi.jicofo.xmpp.FocusComponent.handleIQSetImpl(FocusComponent.java:228)
at org.jitsi.xmpp.component.ComponentBase.handleIQSet(ComponentBase.java:362)
at org.xmpp.component.AbstractComponent.processIQRequest(AbstractComponent.java:515)
at org.xmpp.component.AbstractComponent.processIQ(AbstractComponent.java:289)
at org.xmpp.component.AbstractComponent.processQueuedPacket(AbstractComponent.java:239)
at org.xmpp.component.AbstractComponent.access$100(AbstractComponent.java:81)
at org.xmpp.component.AbstractComponent$PacketProcessor.run(AbstractComponent.java:1051)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.jivesoftware.smack.SmackException$NoResponseException: No response received within reply timeout. Timeout was 15000ms (~15s). Waited for response using: AndFilter: (StanzaTypeFilter: Presence, OrFilter: (AndFilter: (FromMatchesFilter (ignoreResourcepart): reli1017@conference.meet.ffmuc.net, MUCUserStatusCodeFilter: status=110), AndFilter: (FromMatchesFilter (full): reli1017@conference.meet.ffmuc.net/focus, StanzaIdFilter: id=SQYdG-1318979, PresenceTypeFilter: type=error))).
at org.jivesoftware.smack.SmackException$NoResponseException.newWith(SmackException.java:111)
at org.jivesoftware.smack.SmackException$NoResponseException.newWith(SmackException.java:98)
at org.jivesoftware.smack.StanzaCollector.nextResultOrThrow(StanzaCollector.java:260)
at org.jivesoftware.smackx.muc.MultiUserChat.enter(MultiUserChat.java:355)
at org.jivesoftware.smackx.muc.MultiUserChat.createOrJoin(MultiUserChat.java:498)
at org.jivesoftware.smackx.muc.MultiUserChat.createOrJoin(MultiUserChat.java:444)
at org.jitsi.impl.protocol.xmpp.ChatRoomImpl.joinAs(ChatRoomImpl.java:240)
... 17 more
Jicofo 2020-12-17 10:56:13.732 SEVERE: [3593] org.jitsi.jicofo.AbstractChannelAllocator.log() jvbbrewery@internal.auth.meet.ffmuc.net/jvb9.meet.ffmuc.net - failed to allocate channels, will consider the bridge faulty: Timed out waiting for a response.
org.jitsi.protocol.xmpp.colibri.exception.TimeoutException: Timed out waiting for a response.
at org.jitsi.impl.protocol.xmpp.colibri.ColibriConferenceImpl.maybeThrowOperationFailed(ColibriConferenceImpl.java:342)
at org.jitsi.impl.protocol.xmpp.colibri.ColibriConferenceImpl.createColibriChannels(ColibriConferenceImpl.java:282)
at org.jitsi.protocol.xmpp.colibri.ColibriConference.createColibriChannels(ColibriConference.java:112)
at org.jitsi.jicofo.ParticipantChannelAllocator.doAllocateChannels(ParticipantChannelAllocator.java:111)
at org.jitsi.jicofo.AbstractChannelAllocator.allocateChannels(AbstractChannelAllocator.java:271)
at org.jitsi.jicofo.AbstractChannelAllocator.doRun(AbstractChannelAllocator.java:190)
at org.jitsi.jicofo.AbstractChannelAllocator.run(AbstractChannelAllocator.java:150)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Jicofo 2020-12-17 10:47:56.376 WARNING: [49] org.jitsi.jicofo.bridge.BridgeSelectionStrategy.log() Failed to select bridge for participantRegion=ffmuc-de1
Jicofo 2020-12-17 10:47:56.376 SEVERE: [49] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() Can not invite participant -- no bridge available.
Jicofo 2020-12-17 10:47:56.376 WARNING: [49] org.jitsi.jicofo.bridge.BridgeSelectionStrategy.log() Failed to select bridge for participantRegion=ffmuc-de1
Jicofo 2020-12-17 10:47:56.376 SEVERE: [49] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() Can not invite participant -- no bridge available.
Jicofo 2020-12-17 10:47:56.377 WARNING: [49] org.jitsi.jicofo.bridge.BridgeSelectionStrategy.log() Failed to select bridge for participantRegion=ffmuc-de1
Jicofo 2020-12-17 10:47:56.377 SEVERE: [49] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() Can not invite participant -- no bridge available.
Jicofo 2020-12-17 10:47:56.377 WARNING: [49] org.jitsi.jicofo.bridge.BridgeSelectionStrategy.log() Failed to select bridge for participantRegion=ffmuc-de1
Jicofo 2020-12-17 10:47:56.377 SEVERE: [49] org.jitsi.jicofo.JitsiMeetConferenceImpl.log() Can not invite participant -- no bridge available.
Jicofo 2020-12-17 10:47:56.377 WARNING: [49] org.jitsi.jicofo.bridge.BridgeSelectionStrategy.log() Failed to select bridge for participantRegion=ffmuc-de1
System stats:
Outage yesterday:
Outage today:
We already run with backend epoll and max openfiles cranked to the max. And as you can see the setup runs pretty stable until “whatever” happens.
Those are our prosody config files:
Any idea what could go wrong?
Best and thank you.