POST: http-bind [HTTP/1.1 504 Gateway Time-out]

Why can it be that my bosh server is timing out? I am on ubuntu 20.04 and followed the easy installation with the ppa. (https://jitsi.github.io/handbook/docs/devops-guide/devops-guide-quickstart).

Here is a few logs:

Nginx
2020/11/11 07:52:39 [error] 2988928#2988928: 75631 upstream timed out (110: Connection timed out) while reading response hea
der from upstream, client: ..
, server: , request: “POST /http-bind?room=test HTTP/1.1”, upstre
am: “http://127.0.0.1:5280/http-bind?room=test”, host: "
", referrer: "https://
*******/test"

jicofo.log:
Jicofo 2020-11-11 07:39:16.165 WARNING: [6671] org.jitsi.impl.protocol.xmpp.OpSetSimpleCapsImpl.log() Failed to dis
cover features for focus...: XMPP error reply received from focus...: XMPPError
: service-unavailable - wait
Jicofo 2020-11-11 07:39:16.165 INFO: [6671] org.jitsi.jicofo.ComponentsDiscovery.log() Service rediscovery disabled
Jicofo 2020-11-11 07:39:16.165 INFO: [6671] org.jitsi.jicofo.FocusManager.log() XMPP provider reg state: Registrati
onState=Registered
Jicofo 2020-11-11 07:39:17.100 SEVERE: [35] org.jivesoftware.whack.ExternalComponentManager.error()
java.net.SocketException: Broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
at java.io.BufferedWriter.flush(BufferedWriter.java:254)
at org.dom4j.io.XMLWriter.flush(XMLWriter.java:272)
at org.jivesoftware.whack.ExternalComponent.send(ExternalComponent.java:371)
at org.jivesoftware.whack.ExternalComponentManager.sendPacket(ExternalComponentManager.java:269)
at org.xmpp.component.AbstractComponent.send(AbstractComponent.java:925)
at org.jitsi.xmpp.component.ComponentBase.access$400(ComponentBase.java:36)
at org.jitsi.xmpp.component.ComponentBase$PingTask.run(ComponentBase.java:577)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Jicofo 2020-11-11 07:39:22.111 SEVERE: [6676] org.jitsi.xmpp.component.ComponentBase.log() Ping timeout for ID: Fdu4q-212889

jvb.log:

2020-11-11 08:07:27.092 INFO: [35] HealthChecker.run#170: Performed a successful health check in PT0S. Sticky failu
re: false
2020-11-11 08:07:27.117 INFO: [34] VideobridgeExpireThread.expire#140: Running expire()
2020-11-11 08:07:37.093 INFO: [35] HealthChecker.run#170: Performed a successful health check in PT0S. Sticky failu
re: false
2020-11-11 08:07:47.092 INFO: [35] HealthChecker.run#170: Performed a successful health check in PT0S. Sticky failu
re: false
2020-11-11 08:07:57.092 INFO: [35] HealthChecker.run#170: Performed a successful health check in PT0S. Sticky failu
re: false

There is no error on prosody:

Nov 11 07:39:12 conference.***.***.***:muc_domain_mapper       warn    Session filters applied
Nov 11 07:39:12 c2s55633a482030 info    Client connected
Nov 11 07:39:12 c2s55633a482030 info    Stream encrypted (TLSv1.2 with ECDHE-RSA-AES128-GCM-SHA256)
Nov 11 07:39:12 c2s55633a482030 info    Authenticated as jvb@auth.***.***.***
Nov 11 07:39:16 conference.***.***.***:muc_domain_mapper       warn    Session filters applied
Nov 11 07:39:16 c2s55633a514810 info    Client connected
Nov 11 07:39:16 c2s55633a514810 info    Stream encrypted (TLSv1.2 with ECDHE-RSA-AES128-GCM-SHA256)
Nov 11 07:39:16 c2s55633a514810 info    Authenticated as focus@auth.***.***.***
Nov 11 07:39:16 focus.***.***.***:component    warn    Component not connected, bouncing error for: <iq to
='focus.***.***.***' id='Fdu4q-212883' from='focus@auth.***.***.***/focus99265246891' type='get'>
Nov 11 07:39:17 jcp55633a569030 info    Incoming Jabber component connection
Nov 11 07:39:17 focus.***.***.***:component    info    External component successfully authenticated
Nov 11 07:39:17 conference.***.***.***:muc_domain_mapper       warn    Session filters applied
Nov 11 07:39:17 mod_bosh        info    New BOSH session, assigned it sid '8ed68441-8f4b-4f1e-b907-56c552dc7f04'
Nov 11 07:53:08 bosh8ed68441-8f4b-4f1e-b907-56c552dc7f04        info    BOSH client disconnected: session close
Nov 11 08:00:52 conference.***.***.***:muc_domain_mapper       warn    Session filters applied
Nov 11 08:00:52 mod_bosh        info    New BOSH session, assigned it sid '56c6a06a-d638-4ba1-ae78-c34dacef4c88'

Well tell me what else I should be looking for.

I see this on prosody logs now with debug enabled:

    Nov 17 07:52:34 socket  debug   server.lua: client 127.0.0.1:36120 read error: closed                                        
    Nov 17 07:52:34 mod_bosh        debug   Request destroyed: table: 0x55dc814f01d0                                             
    Nov 17 07:52:34 bosh5bb8a962-2115-4854-9d44-0a03658aec22        debug   BOSH session marked as inactive (for 60s)            
    Nov 17 07:52:34 socket  debug   server.lua: closed client handler and removed socket from list  

Do you have any firewalls running? I cannot think of anything else that maybe wrong. What we have seen 504 timing out is normally an error in prosody seen in any of the logs and it does not respond …

Yes I have. But the ports are open:

# ufw status numbered
[ 3] 80/tcp                     ALLOW IN    Anywhere                                                                         
[ 4] 443/tcp                    ALLOW IN    Anywhere 
(...)
[39] 8888/tcp                   ALLOW IN    Anywhere                                                                         
[40] 60000:61000/udp            ALLOW IN    Anywhere                                                                         
[41] 3478/tcp                   ALLOW IN    Anywhere                                                                         
[42] 5349/tcp                   ALLOW IN    Anywhere                                                                         
[43] 10000                      ALLOW IN    Anywhere                                                                         
[44] 3478                       ALLOW IN    Anywhere                                                                         
[45] 5349                       ALLOW IN    Anywhere                                                                         
[46] 10000/udp                  ALLOW IN    Anywhere                   
[47] 5349/udp                   ALLOW IN    Anywhere   

Yeah is a bit messy…

Prosody is running fine:

$ sudo systemctl status prosody
● prosody.service - Prosody XMPP Server
     Loaded: loaded (/lib/systemd/system/prosody.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2020-11-17 12:45:02 UTC; 15h ago
       Docs: https://prosody.im/doc
   Main PID: 999040 (lua5.2)
      Tasks: 1 (limit: 19175)
     Memory: 44.6M
     CGroup: /system.slice/prosody.service
             └─999040 lua5.2 /usr/bin/prosody

Nov 17 12:45:02 fetube.flatearth.co.il systemd[1]: Started Prosody XMPP Server.

These are some nginx logs:


2020/11/17 22:56:18 [error] 1051063#1051063: *88413 upstream timed out (110: Connection timed out) while reading response header from upstream, client: ***.my.ip.***, server: my.replaced.domain, request: "POST /http-bind?room=test HTTP/1.1", upstream: "http://127.0.0.1:5280/http-bind?room=test", host: "my.replaced.domain", referrer: "https://my.replaced.domain/test"
2020/11/17 22:57:19 [error] 1051063#1051063: *88413 upstream timed out (110: Connection timed out) while reading response header from upstream, client: ***.my.ip.***, server: my.replaced.domain, request: "POST /http-bind?room=test HTTP/1.1", upstream: "http://127.0.0.1:5280/http-bind?room=test", host: "my.replaced.domain", referrer: "https://my.replaced.domain/test"
2020/11/17 23:19:35 [alert] 1051063#1051063: *89451 could not allocate new session in SSL session shared cache "le_nginx_SSL" while SSL handshaking, client: 127.0.0.1, server: 0.0.0.0:8443
2020/11/17 23:19:51 [alert] 1051063#1051063: *89455 could not allocate new session in SSL session shared cache "le_nginx_SSL" while SSL handshaking, client: 127.0.0.1, server: 0.0.0.0:8443
2020/11/17 23:19:51 [alert] 1051063#1051063: *89455 could not allocate new session in SSL session shared cache "le_nginx_SSL" while SSL handshaking, client: 127.0.0.1, server: 0.0.0.0:8443

The only error prosody shows on its log a bit before I get a 504 on the browser is:

Nov 18 04:39:58 conference.my.replaced.domain:muc_domain_mapper       debug   No need to rewrite jvbbrewery@internal.auth.m
eet.flatearth.co.il/23945629-b128-435a-8ed8-e6020354e28b (not from the MUC host) C5zcx-126461, <nil>                         
Nov 18 04:40:00 socket  debug   server.lua: client 127.0.0.1:42620 read error: closed                                        
Nov 18 04:40:00 mod_bosh        debug   Request destroyed: table: 0x5624f2d954d0                                             
Nov 18 04:40:00 bosh824c75b6-457e-49b3-9471-10b58910353e        debug   BOSH session marked as inactive (for 60s)            
'focus.my.replaced.domain'>                                                                                                
Nov 18 04:40:08 c2s5624f3033400 debug   Received[c2s]: <presence id='XVp9F-26032' to='jvbbrewery@internal.auth.meet.flatearth
.co.il/23945629-b128-435a-8ed8-e6020354e28b'>
Nov 18 04:40:08 auth.my.replaced.domain:pep   debug   get_pep_service("jvb")
Nov 18 04:40:08 conference.my.replaced.domain:muc_domain_mapper       debug   No need to rewrite out 'to' jvbbrewery@intern
----------                                                                                                                   
Nov 18 04:40:15 conference.my.replaced.domain:muc_domain_mapper       debug   Filtering stanza type iq  to focus@auth.meet.
eet.flatearth.co.il/ZIC1vVoH from jvbbrewery@internal.auth.my.replaced.domain/23945629-b128-435a-8ed8-e6020354e28b    

What is this jvbbrewery@internal.auth that is not needed to rewrite to, it keeps trying that all the time.

Notice that not all requests time out:

But they all take very long, and that is after 10 minutes… I think there should have been way more xhr transmitted then.

Also isn’t this bad from jicofo (avideobridge2)?

Jicofo 2020-11-17 12:45:12.889 SEVERE: [537] org.jivesoftware.whack.ExternalComponentManager.error()                         
java.net.SocketException: Broken pipe (Write failed)                                                                         
        at java.net.SocketOutputStream.socketWrite0(Native Method)                                                           
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)                                              
        at java.net.SocketOutputStream.write(SocketOutputStream.java:155)                                                    
        at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)                                                       
        at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)                                                  
        at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
        at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)                                              [5/441]
        at java.io.BufferedWriter.flush(BufferedWriter.java:254)
        at org.dom4j.io.XMLWriter.flush(XMLWriter.java:272)
        at org.jivesoftware.whack.ExternalComponent.send(ExternalComponent.java:371)                                  
        at org.jivesoftware.whack.ExternalComponentManager.sendPacket(ExternalComponentManager.java:269)
        at org.xmpp.component.AbstractComponent.send(AbstractComponent.java:925)
        at org.jitsi.xmpp.component.ComponentBase.access$400(ComponentBase.java:36)
        at org.jitsi.xmpp.component.ComponentBase$PingTask.run(ComponentBase.java:577)
        at java.util.TimerThread.mainLoop(Timer.java:555)
        at java.util.TimerThread.run(Timer.java:505)
Jicofo 2020-11-17 12:45:15.032 INFO: [30] org.jitsi.jicofo.xmpp.BaseBrewery.log() Added brewery instance: jvbbrewery@internal
.auth.meet.flatearth.co.il/23945629-b128-435a-8ed8-e6020354e28b
Jicofo 2020-11-17 12:45:15.033 INFO: [30] org.jitsi.jicofo.bridge.BridgeSelector.log() Added new videobridge: Bridge[jid=jvbb

And this on jvb.log

2020-11-18 04:51:38.360 INFO: [35] HealthChecker.run#170: Performed a successful health check in PT0S. Sticky failure: false 
2020-11-18 04:51:43.200 INFO: [33] HealthChecker.run#170: Performed a successful health check in PT0S. Sticky failure: false 
2020-11-18 04:51:43.233 INFO: [32] VideobridgeExpireThread.expire#140: Running expire()

It is a digital ocean dropplet. Are you sure there is nothing else than closed ports that this could be. I don’t know much about nmap but:

I honestly think there is no doubt the ports are open… I am not sure how to efectively test them but:

# matheus @ arch-matheus in ~ [2:01:37] 
$ nc -z -v -u my.replaced.domain 10000

# matheus @ arch-matheus in ~ [2:02:00] C:1
$ nc -z -v -u my.replaced.domain 10000

# matheus @ arch-matheus in ~ [2:02:08] C:1
$ nc -z -v -u my.replaced.domain 3478 

# matheus @ arch-matheus in ~ [2:02:18] C:1
$ nc -z -v -u my.replaced.domain 5349

# matheus @ arch-matheus in ~ [2:02:26] C:1
$ nc -z -v -u my.replaced.domain 5555
Connection to my.replaced.domain (159.65.85.96) 5555 port [udp/personal-agent] succeeded!

I think there wouldn’t be occasional delayed replies from prosody if they were closed. The machine has 16gb of ram and is surely not botlenecking cpu.