Hi Everyone,
I am performing some tests with Jitsi in my own environment trying to figure out the maximum number of JVBs a single shard can handle.
Our test environment consist of the following :
1- A single Shard Instance with Prosdy and Jicofo + Nginx | AWS c5.4xlarge 16 Cores 32G RAM
dpkg -s prosody | grep Version
Version: 0.11.2-1
Config-Version: 0.11.2-1
2- 45 JVBs connected from different AWS regions to the same SHARD
dpkg -s jicofo | grep Version
Version: 1.0-563-1
The issue is that when I went beyond 45 JVBs, prosody CPU went to 100% everything slowed down and I started to notice some errors about allocated JVBs channels in the logs :
Jicofo:
Jicofo 2020-05-14 08:27:37.439 SEVERE: [1543] org.jitsi.jicofo.AbstractChannelAllocator.log() jvbbrewery@internal.auth.MYDomain.com/us-east-1-jvb-01 - failed to allocate channels, will consider the bridge faulty: Timed out waiting for a response. org.jitsi.protocol.xmpp.colibri.exception.TimeoutException: Timed out waiting for a response. at org.jitsi.impl.protocol.xmpp.colibri.ColibriConferenceImpl.maybeThrowOperationFailed(ColibriConferenceImpl.java:400) at org.jitsi.impl.protocol.xmpp.colibri.ColibriConferenceImpl.createColibriChannels(ColibriConferenceImpl.java:315) at org.jitsi.protocol.xmpp.colibri.ColibriConference.createColibriChannels(ColibriConference.java:115) at org.jitsi.jicofo.ParticipantChannelAllocator.doAllocateChannels(ParticipantChannelAllocator.java:146) at org.jitsi.jicofo.AbstractChannelAllocator.allocateChannels(AbstractChannelAllocator.java:250) at org.jitsi.jicofo.AbstractChannelAllocator.doRun(AbstractChannelAllocator.java:169) at org.jitsi.jicofo.AbstractChannelAllocator.run(AbstractChannelAllocator.java:143) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)
JVB :
> 2020-05-14 08:26:39.428 INFO: [21] org.ice4j.ice.harvest.SinglePortUdpHarvester.<init>: Initialized SinglePortUdpHarvester with address *********:10000/udp
SCTP JNI load: Linux OS detected SCTP lib loaded =====>: org_jitsi_modified_sctp4j_SctpJni.c calling init =====>: org_jitsi_modified_sctp4j_SctpJni.c about to set SCTP_DEBUG_ALL 2020-05-14 08:26:40.319 INFO: [21] AbstractHealthCheckService.run#171: Performed a successful health check in PT1.006824S. Sticky failure: false 2020-05-14 08:26:43.417 WARNING: [22] [hostname=MYDomain.com id=california1] MucClient.lambda$getConnectAndLoginCallable$7#643: [MucClient id=california1 hostname=MYDomain.com] error connecting org.jivesoftware.smack.SmackException: javax.net.ssl.SSLException: Unsupported or unrecognized SSL message at org.jivesoftware.smack.tcp.XMPPTCPConnection$PacketReader.parsePackets(XMPPTCPConnection.java:1076) at org.jivesoftware.smack.tcp.XMPPTCPConnection$PacketReader.access$300(XMPPTCPConnection.java:1000) at org.jivesoftware.smack.tcp.XMPPTCPConnection$PacketReader$1.run(XMPPTCPConnection.java:1016) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: javax.net.ssl.SSLException: Unsupported or unrecognized SSL message at java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:439) at java.base/sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:184) at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:108) at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1144) at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1055) at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:395) at org.jivesoftware.smack.tcp.XMPPTCPConnection.proceedTLSReceived(XMPPTCPConnection.java:810) at org.jivesoftware.smack.tcp.XMPPTCPConnection.access$1200(XMPPTCPConnection.java:151) at org.jivesoftware.smack.tcp.XMPPTCPConnection$PacketReader.parsePackets(XMPPTCPConnection.java:1071) ... 3 more 2020-05-14 08:26:43.418 WARNING: [28] org.jivesoftware.smack.AbstractXMPPConnection.callConnectionClosedOnErrorListener: Connection XMPPTCPConnection[not-authenticated] (0) closed with error java.net.SocketException: Connection or inbound has closed at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:798) at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185) at java.base/java.io.BufferedReader.read1(BufferedReader.java:210) at java.base/java.io.BufferedReader.read(BufferedReader.java:287) at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2992) at org.xmlpull.mxp1.MXParser.more(MXParser.java:3046) at org.xmlpull.mxp1.MXParser.parseProlog(MXParser.java:1410) at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1395) at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093) at org.jivesoftware.smack.tcp.XMPPTCPConnection$PacketReader.parsePackets(XMPPTCPConnection.java:1248) at org.jivesoftware.smack.tcp.XMPPTCPConnection$PacketReader.access$300(XMPPTCPConnection.java:1000) at org.jivesoftware.smack.tcp.XMPPTCPConnection$PacketReader$1.run(XMPPTCPConnection.java:1016) at java.base/java.lang.Thread.run(Thread.java:834) 2020-05-14 08:26:43.419 WARNING: [28] [hostname=MYDomain.com id=california1] MucClient$1.connectionClosedOnError#295: Closed on error: java.net.SocketException: Connection or inbound has closed at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:798) at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185) at java.base/java.io.BufferedReader.read1(BufferedReader.java:210) at java.base/java.io.BufferedReader.read(BufferedReader.java:287) at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2992) at org.xmlpull.mxp1.MXParser.more(MXParser.java:3046) at org.xmlpull.mxp1.MXParser.parseProlog(MXParser.java:1410) at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1395) at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093) at org.jivesoftware.smack.tcp.XMPPTCPConnection$PacketReader.parsePackets(XMPPTCPConnection.java:1248) at org.jivesoftware.smack.tcp.XMPPTCPConnection$PacketReader.access$300(XMPPTCPConnection.java:1000) at org.jivesoftware.smack.tcp.XMPPTCPConnection$PacketReader$1.run(XMPPTCPConnection.java:1016) at java.base/java.lang.Thread.run(Thread.java:834) 2020-05-14 08:26:49.312 INFO: [21] Videobridge.createConference#320: create_conf, id=d96faa13d226ce99 gid=null logging=false 2020-05-14 08:26:49.353 INFO: [21] AbstractHealthCheckService.run#171: Performed a successful health check in PT0.041714S. Sticky failure: false 2020-05-14 08:26:53.440 WARNING: [22] [hostname=MYDomain.com id=california1] MucClient.lambda$getConnectAndLoginCallable$7#643: [MucClient id=california1 hostname=MYDomain.com] error connecting org.jivesoftware.smack.SmackException$NoResponseException: No response received within reply timeout. Timeout was 5000ms (~5s). While waiting for establishing TLS at org.jivesoftware.smack.SmackException$NoResponseException.newWith(SmackException.java:93) at org.jivesoftware.smack.SynchronizationPoint.checkForResponse(SynchronizationPoint.java:272) at org.jivesoftware.smack.SynchronizationPoint.checkIfSuccessOrWait(SynchronizationPoint.java:157) at org.jivesoftware.smack.SynchronizationPoint.checkIfSuccessOrWaitOrThrow(SynchronizationPoint.java:128) at org.jivesoftware.smack.tcp.XMPPTCPConnection.connectInternal(XMPPTCPConnection.java:908) at org.jivesoftware.smack.AbstractXMPPConnection.connect(AbstractXMPPConnection.java:383) at org.jitsi.xmpp.mucclient.MucClient.lambda$getConnectAndLoginCallable$7(MucClient.java:638) at org.jitsi.retry.RetryStrategy$TaskRunner.run(RetryStrategy.java:193) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) 2020-05-14 08:26:53.462 INFO: [45] org.jivesoftware.smack.tcp.XMPPTCPConnection$PacketReader.parsePackets: XMPPTCPConnection[not-authenticated] (0) received closing </stream> element. Server wants to terminate the connection, calling disconnect() 2020-05-14 08:26:58.461 WARNING: [40] [hostname=MYDomain.com id=california1] MucClient$1.reconnectionFailed#326: Reconnection failed: org.jivesoftware.smack.SmackException$NoResponseException: No response received within reply timeout. Timeout was 5000ms (~5s). While waiting for establishing TLS at org.jivesoftware.smack.SmackException$NoResponseException.newWith(SmackException.java:93) at org.jivesoftware.smack.SynchronizationPoint.checkForResponse(SynchronizationPoint.java:272) at org.jivesoftware.smack.SynchronizationPoint.checkIfSuccessOrWait(SynchronizationPoint.java:157) at org.jivesoftware.smack.SynchronizationPoint.checkIfSuccessOrWaitOrThrow(SynchronizationPoint.java:128) at org.jivesoftware.smack.tcp.XMPPTCPConnection.connectInternal(XMPPTCPConnection.java:908) at org.jivesoftware.smack.AbstractXMPPConnection.connect(AbstractXMPPConnection.java:383) at org.jivesoftware.smack.ReconnectionManager$2.run(ReconnectionManager.java:289) at java.base/java.lang.Thread.run(Thread.java:834) 2020-05-14 08:26:59.312 INFO: [21] Videobridge.createConference#320: create_conf, id=2b47314e74c63249 gid=null logging=false 2020-05-14 08:26:59.323 INFO: [21] AbstractHealthCheckService.run#171: Performed a successful health check in PT0.011603S. Sticky failure: false
Prosody :
|May 14 07:28:53 c2s5613fc6e7510|info|Client disconnected: handshake too long|
|---|---|---|
|May 14 07:29:35 c2s5613fb726680|info|Client disconnected: handshake too long|
|May 14 07:30:18 c2s5613fa129890|info|Client disconnected: handshake too long|
|May 14 07:32:52 c2s5613f828f090|info|Client disconnected: handshake too long|
|May 14 07:33:08 c2s5613fa2bc9e0|info|Client disconnected: handshake too long|
|May 14 07:33:40 c2s5613fad1b030|info|Client disconnected: handshake too long|
|May 14 07:34:13 c2s5613f9e7fb50|info|Client disconnected: handshake too long|
|May 14 07:34:33 c2s5613fa192230|info|Client disconnected: handshake too long|
|May 14 07:36:12 c2s5613fcbca6f0|info|Client disconnected: handshake too long|
|May 14 07:36:54 c2s5613fcb0d570|info|Client disconnected: handshake too long|
|May 14 07:37:36 c2s5613fd854300|info|Client disconnected: handshake too long|
|May 14 07:38:08 c2s5613f8749ac0|info|Client disconnected: handshake too long|
|May 14 07:38:49 c2s5613fbe3f420|info|Client disconnected: handshake too long|
|May 14 07:40:29 c2s5613f8923450|info|Client disconnected: handshake too long|
|May 14 07:41:01 c2s5613fab03790|info|Client disconnected: handshake too long|
|May 14 07:41:29 c2s5613f75219c0|info|Client disconnected: handshake too long|
Observations :
I was under the impression that prosody is multi-threaded, However I noticed that it is using a single core I am not sure if there’s a specific config paramater to change that.
Once I removed the newely added JVBs and went back to the 45 JVBs I started with prosody CPU went down to below 50% and the environment started to get back to normal.
Question:
1- Is there any limitations to the number of JVBs a single SHARD can handle ?
2- Is there any fine-tunning configuration parametrs I should take into considration to handle such load ?
3- Any solution or suggestion regarding the above warnnings and errors ?
Thanks in advance.
Best Regards,
Mohamed Abada