[jitsi-users] jvb leaking file descriptors


#1

Hi,

I am running jitsi-videobridge (726-1) on Debian Jessie with openjdk-8
(8u72-b15-1~bpo8+1)

I am seeing a problem with file desctirptor leakage until the process
eventiually maxs out its ulimit (4096) and becomes wedged (needs restarted).

it seems to leak approx 16 FDs every 10 seconds or so.

I establish this by running:
lsof -p $(pgrep -u jvb)

from which I can see an ever-growing list of these:

java 1499 jvb 2337u sock 0,7 0t0 1514853 can't
identify protocol
java 1499 jvb 2338u sock 0,7 0t0 1529373 can't
identify protocol
java 1499 jvb 2339u sock 0,7 0t0 1514860 can't
identify protocol

are there any known issues in this area?

in the jvb logs I see the attached repeated pattern of activity that seems
to correlate with the leakage.

it this a normal pattern? I'm pretty sure that there are actually no
active conferences.

why would I keep seeing this logging line:
JVB 2016-05-18 14:21:26.815 INFO: [226258]
org.jitsi.videobridge.IceUdpTransportManager.info() Failed to connect
IceUdpTransportManager:
net.java.sip.communicator.service.protocol.OperationFailedException:
TransportManager closed

this seems like a potential source of FD allocation.

if you can suggest extra logging I can enable or extra information I can
prove or things I can dig into I'm happy to do so.

Thanks,
RD

jvb_leak.txt (12.5 KB)


#2

Hi Raoul,

Hi,

I am running jitsi-videobridge (726-1) on Debian Jessie with openjdk-8
(8u72-b15-1~bpo8+1)

I am seeing a problem with file desctirptor leakage until the process
eventiually maxs out its ulimit (4096) and becomes wedged (needs restarted).

it seems to leak approx 16 FDs every 10 seconds or so.

I establish this by running:
  
from which I can see an ever-growing list of these:

java 1499 jvb 2337u sock 0,7 0t0 1514853 can't
identify protocol
java 1499 jvb 2338u sock 0,7 0t0 1529373 can't
identify protocol
java 1499 jvb 2339u sock 0,7 0t0 1514860 can't
identify protocol

are there any known issues in this area?

in the jvb logs I see the attached repeated pattern of activity that
seems to correlate with the leakage.

it this a normal pattern? I'm pretty sure that there are actually no
active conferences.

Thank you for reporting this! It is not normal, and not known AFAIK. I checked some of our currently running machines, and I don't see any significant number of FDs for the bridge process.
However, in the logs from our performance test environment I see consistently many (over 9000) open FDs at the end of the test.

I suspect that health-checks are what triggers the problem in your case (without any conferences). On a test machine, I observe exactly 16 new FDs right after I run a health-check against a bridge[0], and jicofo sends health-check requests once every 10 seconds by default, so it seems consistent with your observations. But in my case they are eventually closed after 5-6 minutes.

You can disable health checks from jicofo by adding this to /etc/jitsi/jicofo/sip-communicator.properties:
org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=-1

Regards,
Boris

···

On 18/05/16 08:28, Raoul Duke wrote:


#3

Hi Boris,

Hi Raoul,

Hi,

I am running jitsi-videobridge (726-1) on Debian Jessie with openjdk-8
(8u72-b15-1~bpo8+1)

I am seeing a problem with file desctirptor leakage until the process
eventiually maxs out its ulimit (4096) and becomes wedged (needs
restarted).

it seems to leak approx 16 FDs every 10 seconds or so.

I establish this by running:

from which I can see an ever-growing list of these:

java 1499 jvb 2337u sock 0,7 0t0 1514853 can't
identify protocol
java 1499 jvb 2338u sock 0,7 0t0 1529373 can't
identify protocol
java 1499 jvb 2339u sock 0,7 0t0 1514860 can't
identify protocol

are there any known issues in this area?

in the jvb logs I see the attached repeated pattern of activity that
seems to correlate with the leakage.

it this a normal pattern? I'm pretty sure that there are actually no
active conferences.

Thank you for reporting this! It is not normal, and not known AFAIK. I
checked some of our currently running machines, and I don't see any
significant number of FDs for the bridge process.
However, in the logs from our performance test environment I see
consistently many (over 9000) open FDs at the end of the test.

Thanks for getting back to me. I went ahead and created a ticket to track
this:
https://github.com/jitsi/jitsi-videobridge/issues/245

I suspect that health-checks are what triggers the problem in your case
(without any conferences). On a test machine, I observe exactly 16 new FDs
right after I run a health-check against a bridge[0], and jicofo sends
health-check requests once every 10 seconds by default, so it seems
consistent with your observations. But in my case they are eventually
closed after 5-6 minutes.

yes strangely enough I have another machine running Debian Jessie with
identical kernel, JRE version etc. which does not seem to be experiencing
the same leak. I'm really not sure what could be different unless perhaps
low-level kernel tuning parameters but I have checked all the examples I
could think of and nothing jumps out.

You can disable health checks from jicofo by adding this to
/etc/jitsi/jicofo/sip-communicator.properties:
org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=-1

that is good to know but I'd rather not disable the health checks. for now
my workaround is a once a day restart.

let me know if there is anything I can do to help progress this. I am
happy to debug etc.

Thanks.

···

On Wed, May 18, 2016 at 5:11 PM, Boris Grozev <boris@jitsi.org> wrote:

On 18/05/16 08:28, Raoul Duke wrote:


#4

To avoid restarting, you can increase the health check interval and the ulimit.

If you want to confirm that it is health checks that trigger it, you can disabled them, enable the REST API and run health checks manually (http://localhost:8080/about/health) while monitoring the fds.

Boris

···

On 18/05/16 11:35, Raoul Duke wrote:

Hi Boris,

On Wed, May 18, 2016 at 5:11 PM, Boris Grozev <boris@jitsi.org > <mailto:boris@jitsi.org>> wrote:

    Hi Raoul,

    On 18/05/16 08:28, Raoul Duke wrote:

        Hi,

        I am running jitsi-videobridge (726-1) on Debian Jessie with
        openjdk-8
        (8u72-b15-1~bpo8+1)

        I am seeing a problem with file desctirptor leakage until the
        process
        eventiually maxs out its ulimit (4096) and becomes wedged (needs
        restarted).

        it seems to leak approx 16 FDs every 10 seconds or so.

        I establish this by running:

        from which I can see an ever-growing list of these:

        java 1499 jvb 2337u sock 0,7 0t0
        1514853 can't
        identify protocol
        java 1499 jvb 2338u sock 0,7 0t0
        1529373 can't
        identify protocol
        java 1499 jvb 2339u sock 0,7 0t0
        1514860 can't
        identify protocol

        are there any known issues in this area?

        in the jvb logs I see the attached repeated pattern of activity that
        seems to correlate with the leakage.

        it this a normal pattern? I'm pretty sure that there are
        actually no
        active conferences.

    Thank you for reporting this! It is not normal, and not known AFAIK.
    I checked some of our currently running machines, and I don't see
    any significant number of FDs for the bridge process.
    However, in the logs from our performance test environment I see
    consistently many (over 9000) open FDs at the end of the test.

Thanks for getting back to me. I went ahead and created a ticket to
track this:
https://github.com/jitsi/jitsi-videobridge/issues/245

    I suspect that health-checks are what triggers the problem in your
    case (without any conferences). On a test machine, I observe exactly
    16 new FDs right after I run a health-check against a bridge[0], and
    jicofo sends health-check requests once every 10 seconds by default,
    so it seems consistent with your observations. But in my case they
    are eventually closed after 5-6 minutes.

yes strangely enough I have another machine running Debian Jessie with
identical kernel, JRE version etc. which does not seem to be
experiencing the same leak. I'm really not sure what could be different
unless perhaps low-level kernel tuning parameters but I have checked all
the examples I could think of and nothing jumps out.

    You can disable health checks from jicofo by adding this to
    /etc/jitsi/jicofo/sip-communicator.properties:
    org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=-1

that is good to know but I'd rather not disable the health checks. for
now my workaround is a once a day restart.

let me know if there is anything I can do to help progress this. I am
happy to debug etc.