[jitsi-users] jvb.log


#1

Boris

Thank you for your help. I followed your instructions to insert

org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=-1

into /etc/jitsi/jicofo/sip-communicator.properties, but I continue to have log entries added to jvb.log every 10 seconds, even with no activity on my Jitsi Meet installation. The log entries look like this:

JVB 2016-05-18 11:58:45.877 INFO: [32] org.jitsi.videobridge.xmpp.ComponentImpl.info() RECV: <iq id="V4NJM-16" type="result" to="jitsi-videobridge.meet.daviddumonde.com" from="meet.daviddumonde.com"/>
JVB 2016-05-18 11:58:55.875 INFO: [36] org.jitsi.videobridge.xmpp.ComponentImpl.info() RECV: <iq id="V4NJM-17" type="result" to="jitsi-videobridge.meet.daviddumonde.com" from="meet.daviddumonde.com"/>
JVB 2016-05-18 11:59:05.875 INFO: [38] org.jitsi.videobridge.xmpp.ComponentImpl.info() RECV: <iq id="V4NJM-18" type="result" to="jitsi-videobridge.meet.daviddumonde.com" from="meet.daviddumonde.com"/>

Any suggestions?

David

···

Date: Thu, 12 May 2016 17:40:00 -0500
From: Boris Grozev <boris@jitsi.org>
To: Jitsi Users <users@jitsi.org>
Subject: Re: [jitsi-users] jvb.log
Message-ID: <57350640.1050601@jitsi.org>
Content-Type: text/plain; charset=utf-8; format=flowed

On 12/05/16 16:46, David Dumonde wrote:

I am testing a self-hosted Jitsi Meet, which I’ve only used a few
times, opening in my browser with no other users, so it has had very
minimal traffic.

I noticed today that the jvb.log file is 1.75 Gb! Is it normal for
this log file to become so bloated? I’m playing around with a 20 Gb
DigitalOcean droplet. I’d hate to think this one log file is eating
up 10% of my file space.

This is most likely due to health checks from jicofo to videobridge. If
you don't need those (which you don't, unless you specifically handle
failures), you can disable them by setting the following property for
jicofo (in /etc/jitsi/jicofo/sip-communicator.properties):
org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=-1

Regards,
Boris


#2

These are likely replies to pings sent from the bridge (unfortunately we don't log the requests).

Boris

···

On 18/05/16 12:09, David Dumonde wrote:

Boris

Thank you for your help. I followed your instructions to insert

    org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=-1

into /etc/jitsi/jicofo/sip-communicator.properties, but I continue to
have log entries added to jvb.log every 10 seconds, even with no
activity on my Jitsi Meet installation. The log entries look like this:

    JVB 2016-05-18 11:58:45.877 INFO: [32]
    org.jitsi.videobridge.xmpp.ComponentImpl.info() RECV: <iq
    id="V4NJM-16" type="result"
    to="jitsi-videobridge.meet.daviddumonde.com
    <http://jitsi-videobridge.meet.daviddumonde.com>"


#3

Hi Boris,

Boris

Thank you for your help. I followed your instructions to insert

    org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=-1

into /etc/jitsi/jicofo/sip-communicator.properties, but I continue to
have log entries added to jvb.log every 10 seconds, even with no
activity on my Jitsi Meet installation. The log entries look like this:

    JVB 2016-05-18 11:58:45.877 INFO: [32]
    org.jitsi.videobridge.xmpp.ComponentImpl.info() RECV: <iq
    id="V4NJM-16" type="result"
    to="jitsi-videobridge.meet.daviddumonde.com
    <http://jitsi-videobridge.meet.daviddumonde.com>"

These are likely replies to pings sent from the bridge (unfortunately we
don't log the requests).

I might have figured out why I'm having the issue on this particular
machine. In the logfile snippet I sent you I noticed that there were 2
distinct IPs referenced in the "org.ice4j.ice.Agent.createComponent"
logging lines. It turns out that particular machine has a second virtual
IP (eth0:1) which explains the second IP (i.e. the eth0 and eth0:1 public
IPs both go to the same physical machine).

I contrasted this with the machine not experiencing the leak which only has
a single NIC (eth0) and I wondered if the leak could be due to some
circular problem relating to the 2 IPs reaching the same endpoint.

anyway, since I am not actively using the eth0:1 IP on the problem machine
I disabled it and now I don't believe I am seeing the same leakage.

I will continue to monitor it and report back in a day or so if it
continues to be stable. I'm not sure if this is a bug or not. it seems
like it could happen to anyone on a machine with multiple NICs and in that
regard does not seem like user error. I'm not sure of the nuts and bolts
of why the secondary IP is problematic. perhaps it would be good if the
IPs which are used by JVB could be overridden. I'm not sure.

Thanks,
RD

···

On Wed, May 18, 2016 at 6:35 PM, Boris Grozev <boris@jitsi.org> wrote:

On 18/05/16 12:09, David Dumonde wrote:


#4

Hi,

A correction on my end: I don't actually see the issue anywhere. The high values I reported previously were based on the output of `lsof` (with no arguments), which lists each FD once for each thread of the jvm. Using `lsof -u jvb` (or -p pid) I see no more than 350 FDs.

Hi Boris,

        Boris

        Thank you for your help. I followed your instructions to insert

             org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=-1

        into /etc/jitsi/jicofo/sip-communicator.properties, but I
        continue to
        have log entries added to jvb.log every 10 seconds, even with no
        activity on my Jitsi Meet installation. The log entries look
        like this:

             JVB 2016-05-18 11:58:45.877 INFO: [32]
        org.jitsi.videobridge.xmpp.ComponentImpl.info
        <http://org.jitsi.videobridge.xmpp.ComponentImpl.info>() RECV: <iq
             id="V4NJM-16" type="result"
             to="jitsi-videobridge.meet.daviddumonde.com
        <http://jitsi-videobridge.meet.daviddumonde.com>
             <http://jitsi-videobridge.meet.daviddumonde.com>"

    These are likely replies to pings sent from the bridge
    (unfortunately we don't log the requests).

I might have figured out why I'm having the issue on this particular
machine. In the logfile snippet I sent you I noticed that there were 2
distinct IPs referenced in the "org.ice4j.ice.Agent.createComponent"
logging lines. It turns out that particular machine has a second
virtual IP (eth0:1) which explains the second IP (i.e. the eth0 and
eth0:1 public IPs both go to the same physical machine).

I contrasted this with the machine not experiencing the leak which only
has a single NIC (eth0) and I wondered if the leak could be due to some
circular problem relating to the 2 IPs reaching the same endpoint.

FWIW some of the machines I checked have more that one interface, and one has a virtual interface (eth0:1) as well.

anyway, since I am not actively using the eth0:1 IP on the problem
machine I disabled it and now I don't believe I am seeing the same leakage.

I'm glad that's fixed now.

Regards,
Boris

···

On 18/05/16 13:08, Raoul Duke wrote:

On Wed, May 18, 2016 at 6:35 PM, Boris Grozev <boris@jitsi.org > <mailto:boris@jitsi.org>> wrote:
    On 18/05/16 12:09, David Dumonde wrote:


#5

Hi again,

After more investigation we did find a leak. The fds opened as a result of health checks (the ones listed by lsof as "can't identify protocol") get closed eventually, so they aren't a problem. But we were leaking sockets from the ice/tcp implementation.

This should be fixed now (after this is merged: https://github.com/jitsi/jitsi-videobridge/pull/246).

This issue has been there for a long time. We're not sure why it started to occur more often just now. It could be some change in browsers (i.e. more firefox users running with ice/tcp).

If you want to verify that this is the issue you were running into, grep for "Not adding a socket to an ICE agent with" in your old videobridge logs. Each such line indicates a leaked socket.

Regards,
Boris

···

On 18/05/16 13:08, Raoul Duke wrote:

Hi Boris,

On Wed, May 18, 2016 at 6:35 PM, Boris Grozev <boris@jitsi.org > <mailto:boris@jitsi.org>> wrote:

    On 18/05/16 12:09, David Dumonde wrote:

        Boris

        Thank you for your help. I followed your instructions to insert

             org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=-1

        into /etc/jitsi/jicofo/sip-communicator.properties, but I
        continue to
        have log entries added to jvb.log every 10 seconds, even with no
        activity on my Jitsi Meet installation. The log entries look
        like this:

             JVB 2016-05-18 11:58:45.877 INFO: [32]
        org.jitsi.videobridge.xmpp.ComponentImpl.info
        <http://org.jitsi.videobridge.xmpp.ComponentImpl.info>() RECV: <iq
             id="V4NJM-16" type="result"
             to="jitsi-videobridge.meet.daviddumonde.com
        <http://jitsi-videobridge.meet.daviddumonde.com>
             <http://jitsi-videobridge.meet.daviddumonde.com>"

    These are likely replies to pings sent from the bridge
    (unfortunately we don't log the requests).

I might have figured out why I'm having the issue on this particular
machine. In the logfile snippet I sent you I noticed that there were 2
distinct IPs referenced in the "org.ice4j.ice.Agent.createComponent"
logging lines. It turns out that particular machine has a second
virtual IP (eth0:1) which explains the second IP (i.e. the eth0 and
eth0:1 public IPs both go to the same physical machine).

I contrasted this with the machine not experiencing the leak which only
has a single NIC (eth0) and I wondered if the leak could be due to some
circular problem relating to the 2 IPs reaching the same endpoint.

anyway, since I am not actively using the eth0:1 IP on the problem
machine I disabled it and now I don't believe I am seeing the same leakage.

I will continue to monitor it and report back in a day or so if it
continues to be stable. I'm not sure if this is a bug or not. it seems
like it could happen to anyone on a machine with multiple NICs and in
that regard does not seem like user error. I'm not sure of the nuts and
bolts of why the secondary IP is problematic. perhaps it would be good
if the IPs which are used by JVB could be overridden. I'm not sure.

Thanks,
RD

_______________________________________________
users mailing list
users@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/users


#6

Hi, has anyone thought about interfacing a PTZ camera with jitsi so that the remote user can control the PTZ? What would be the right approach architecturally to do this if it hasn't been done already? Does Jitsi support some kind of plug-in architecture?

My thoughts,
- Create a simple widget to control a PTZ camera that could be integrated into the Jitsi UI.
- Whichever feed is the main one on the screen, show the PTZ widget if it is a PTZ camera and that user has permission*
- Use WebRTC DataChannel to transmit PTZ commands to the correct camera

Thoughts?


#7

Hi Boris,

Thanks for looking into this.

From my side it looks like the leak I am seeing is still present. I am

not seeing the logging line you mentioned ("Not adding a socket to an ICE
agent with") in my current or historic logs so perhaps I am hitting on a
different issue than the one you have addressed.

I now believe that my discovery about the second (virtual) IP was
contributing only to the leak happeneing faster but I still see the leak
without the second IP, albeit at a slower rate.

I just tried videobridge 737 (package) which appears to include your change
(although I'm not 100% sure).

here is my approach to replicating the issue I am seeing:
* use latest jitsi-videobridge and jicofo (amd64) DEBs from here
https://download.jitsi.org/jitsi/debian/?C=M;O=D
* I'm using ejabberd as my XMPP server in case that could be relevant
(although it seems that the leak must be ICE connection related rather than
XMPP right?)
* in jicofo config I increased the frequency of healthchecks to every 4
seconds (org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=4000) just to be more
aggressive
* if rhe machine has multiple IPs this seems to also contribute to the leak
happeneing faster
* I am running the following command in a terminal window to watch the
number of FDs

watch "lsof -u jvb | wc -l"

I then just watch and wait. there is no traffic hitting the videobridge so
I can only conclude that the problem I am seeting is healthcheck related as
you originally suggested.

what I see is the number of FDs creep up over time. as you said originally
it will dip back down again every few minutes as FDs are released. but I
see the low-watermark creeping up over time i.e. the low-watermark starts
around 300 or so but after enough time it starts to creep up beyond 500 and
just grow beyond that.

to be 100% sure when the low-watermark reached 708 I then stopped jicofo to
kill off further healthchecks and the FDs number still stayed static at 708
even after 10+ mins. this seems like evidence of a leak to me.

I have replicated this now both on Debian Jessie and Ubuntu 14.04 (both 64
bit).

I am going to experiment with File Leak Detector (
http://file-leak-detector.kohsuke.org/) if I get time.

Thanks,
RD

···

On Fri, May 20, 2016 at 7:38 PM, Boris Grozev <boris@jitsi.org> wrote:

Hi again,

After more investigation we did find a leak. The fds opened as a result of
health checks (the ones listed by lsof as "can't identify protocol") get
closed eventually, so they aren't a problem. But we were leaking sockets
from the ice/tcp implementation.

This should be fixed now (after this is merged:
https://github.com/jitsi/jitsi-videobridge/pull/246).

This issue has been there for a long time. We're not sure why it started
to occur more often just now. It could be some change in browsers (i.e.
more firefox users running with ice/tcp).

If you want to verify that this is the issue you were running into, grep
for "Not adding a socket to an ICE agent with" in your old videobridge
logs. Each such line indicates a leaked socket.

Regards,
Boris

On 18/05/16 13:08, Raoul Duke wrote:

Hi Boris,

On Wed, May 18, 2016 at 6:35 PM, Boris Grozev <boris@jitsi.org >> <mailto:boris@jitsi.org>> wrote:

    On 18/05/16 12:09, David Dumonde wrote:

        Boris

        Thank you for your help. I followed your instructions to insert

             org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=-1

        into /etc/jitsi/jicofo/sip-communicator.properties, but I
        continue to
        have log entries added to jvb.log every 10 seconds, even with no
        activity on my Jitsi Meet installation. The log entries look
        like this:

             JVB 2016-05-18 11:58:45.877 INFO: [32]
        org.jitsi.videobridge.xmpp.ComponentImpl.info
        <http://org.jitsi.videobridge.xmpp.ComponentImpl.info>() RECV:
<iq

             id="V4NJM-16" type="result"
             to="jitsi-videobridge.meet.daviddumonde.com
        <http://jitsi-videobridge.meet.daviddumonde.com>
             <http://jitsi-videobridge.meet.daviddumonde.com>"

    These are likely replies to pings sent from the bridge
    (unfortunately we don't log the requests).

I might have figured out why I'm having the issue on this particular
machine. In the logfile snippet I sent you I noticed that there were 2
distinct IPs referenced in the "org.ice4j.ice.Agent.createComponent"
logging lines. It turns out that particular machine has a second
virtual IP (eth0:1) which explains the second IP (i.e. the eth0 and
eth0:1 public IPs both go to the same physical machine).

I contrasted this with the machine not experiencing the leak which only
has a single NIC (eth0) and I wondered if the leak could be due to some
circular problem relating to the 2 IPs reaching the same endpoint.

anyway, since I am not actively using the eth0:1 IP on the problem
machine I disabled it and now I don't believe I am seeing the same
leakage.

I will continue to monitor it and report back in a day or so if it
continues to be stable. I'm not sure if this is a bug or not. it seems
like it could happen to anyone on a machine with multiple NICs and in
that regard does not seem like user error. I'm not sure of the nuts and
bolts of why the secondary IP is problematic. perhaps it would be good
if the IPs which are used by JVB could be overridden. I'm not sure.

Thanks,
RD

_______________________________________________
users mailing list
users@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/users


#8

Hi Boris,

Thanks for looking into this.

From my side it looks like the leak I am seeing is still present. I
am not seeing the logging line you mentioned ("Not adding a socket to an
ICE agent with") in my current or historic logs so perhaps I am hitting
on a different issue than the one you have addressed.

I now believe that my discovery about the second (virtual) IP was
contributing only to the leak happeneing faster but I still see the leak
without the second IP, albeit at a slower rate.

I just tried videobridge 737 (package) which appears to include your
change (although I'm not 100% sure).

here is my approach to replicating the issue I am seeing:
* use latest jitsi-videobridge and jicofo (amd64) DEBs from here
https://download.jitsi.org/jitsi/debian/?C=M;O=D
* I'm using ejabberd as my XMPP server in case that could be relevant
(although it seems that the leak must be ICE connection related rather
than XMPP right?)
* in jicofo config I increased the frequency of healthchecks to every 4
seconds (org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=4000) just to be more
aggressive
* if rhe machine has multiple IPs this seems to also contribute to the
leak happeneing faster
* I am running the following command in a terminal window to watch the
number of FDs

watch "lsof -u jvb | wc -l"

I then just watch and wait. there is no traffic hitting the videobridge
so I can only conclude that the problem I am seeting is healthcheck
related as you originally suggested.

what I see is the number of FDs creep up over time. as you said
originally it will dip back down again every few minutes as FDs are
released. but I see the low-watermark creeping up over time i.e. the
low-watermark starts around 300 or so but after enough time it starts to
creep up beyond 500 and just grow beyond that.

to be 100% sure when the low-watermark reached 708 I then stopped jicofo
to kill off further healthchecks and the FDs number still stayed static
at 708 even after 10+ mins. this seems like evidence of a leak to me.

I have replicated this now both on Debian Jessie and Ubuntu 14.04 (both
64 bit).

I am going to experiment with File Leak Detector
(http://file-leak-detector.kohsuke.org/) if I get time.

I observe the same thing, except that over time and with more health checks requests the number drops back down. Specifically, I run a few thousand health checks (via REST) and although it reached 1500 at some point, it never reached the ulimit and went back to around 300.

I don't actually understand what these fds are, but my best guess is that the jvb or one of the underlying libraries caches them for reuse. So, I don't think that this causes a problem.

We did observe a few instances in which the bridge hit the ulimit ("too many open files" errors came up from various places), but in these cases, in addition to the "can't identify protocol" fds we saw many leaked sockets in CLOSE_WAIT.

Do you still observe the ulimit being reached? If so, could you share your bridge log file and the output of lsof?

Regards,
Boris

···

On 20/05/16 19:33, Raoul Duke wrote:

Thanks,
RD

On Fri, May 20, 2016 at 7:38 PM, Boris Grozev <boris@jitsi.org > <mailto:boris@jitsi.org>> wrote:

    Hi again,

    After more investigation we did find a leak. The fds opened as a
    result of health checks (the ones listed by lsof as "can't identify
    protocol") get closed eventually, so they aren't a problem. But we
    were leaking sockets from the ice/tcp implementation.

    This should be fixed now (after this is merged:
    https://github.com/jitsi/jitsi-videobridge/pull/246).

    This issue has been there for a long time. We're not sure why it
    started to occur more often just now. It could be some change in
    browsers (i.e. more firefox users running with ice/tcp).

    If you want to verify that this is the issue you were running into,
    grep for "Not adding a socket to an ICE agent with" in your old
    videobridge logs. Each such line indicates a leaked socket.

    Regards,
    Boris

    On 18/05/16 13:08, Raoul Duke wrote:

        Hi Boris,

        On Wed, May 18, 2016 at 6:35 PM, Boris Grozev <boris@jitsi.org > <mailto:boris@jitsi.org> > <mailto:boris@jitsi.org <mailto:boris@jitsi.org>>> wrote:

             On 18/05/16 12:09, David Dumonde wrote:

                 Boris

                 Thank you for your help. I followed your instructions
        to insert

                      org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=-1

                 into /etc/jitsi/jicofo/sip-communicator.properties, but I
                 continue to
                 have log entries added to jvb.log every 10 seconds,
        even with no
                 activity on my Jitsi Meet installation. The log entries
        look
                 like this:

                      JVB 2016-05-18 11:58:45.877 INFO: [32]
        org.jitsi.videobridge.xmpp.ComponentImpl.info
        <http://org.jitsi.videobridge.xmpp.ComponentImpl.info>

        <http://org.jitsi.videobridge.xmpp.ComponentImpl.info>() RECV: <iq

                      id="V4NJM-16" type="result"
                      to="jitsi-videobridge.meet.daviddumonde.com
        <http://jitsi-videobridge.meet.daviddumonde.com>
                 <http://jitsi-videobridge.meet.daviddumonde.com>
                      <http://jitsi-videobridge.meet.daviddumonde.com>"

             These are likely replies to pings sent from the bridge
             (unfortunately we don't log the requests).

        I might have figured out why I'm having the issue on this particular
        machine. In the logfile snippet I sent you I noticed that there
        were 2
        distinct IPs referenced in the "org.ice4j.ice.Agent.createComponent"
        logging lines. It turns out that particular machine has a second
        virtual IP (eth0:1) which explains the second IP (i.e. the eth0 and
        eth0:1 public IPs both go to the same physical machine).

        I contrasted this with the machine not experiencing the leak
        which only
        has a single NIC (eth0) and I wondered if the leak could be due
        to some
        circular problem relating to the 2 IPs reaching the same endpoint.

        anyway, since I am not actively using the eth0:1 IP on the problem
        machine I disabled it and now I don't believe I am seeing the
        same leakage.

        I will continue to monitor it and report back in a day or so if it
        continues to be stable. I'm not sure if this is a bug or not.
        it seems
        like it could happen to anyone on a machine with multiple NICs
        and in
        that regard does not seem like user error. I'm not sure of the
        nuts and
        bolts of why the secondary IP is problematic. perhaps it would
        be good
        if the IPs which are used by JVB could be overridden. I'm not sure.

        Thanks,
        RD

        _______________________________________________
        users mailing list
        users@jitsi.org <mailto:users@jitsi.org>
        Unsubscribe instructions and other list options:
        http://lists.jitsi.org/mailman/listinfo/users

    _______________________________________________
    users mailing list
    users@jitsi.org <mailto:users@jitsi.org>
    Unsubscribe instructions and other list options:
    http://lists.jitsi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/users


#9

Hi Boris,

Thanks for looking into this.

From my side it looks like the leak I am seeing is still present. I
am not seeing the logging line you mentioned ("Not adding a socket to an
ICE agent with") in my current or historic logs so perhaps I am hitting
on a different issue than the one you have addressed.

I now believe that my discovery about the second (virtual) IP was
contributing only to the leak happeneing faster but I still see the leak
without the second IP, albeit at a slower rate.

I just tried videobridge 737 (package) which appears to include your
change (although I'm not 100% sure).

here is my approach to replicating the issue I am seeing:
* use latest jitsi-videobridge and jicofo (amd64) DEBs from here
https://download.jitsi.org/jitsi/debian/?C=M;O=D
* I'm using ejabberd as my XMPP server in case that could be relevant
(although it seems that the leak must be ICE connection related rather
than XMPP right?)
* in jicofo config I increased the frequency of healthchecks to every 4
seconds (org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=4000) just to be more
aggressive
* if rhe machine has multiple IPs this seems to also contribute to the
leak happeneing faster
* I am running the following command in a terminal window to watch the
number of FDs

watch "lsof -u jvb | wc -l"

I then just watch and wait. there is no traffic hitting the videobridge
so I can only conclude that the problem I am seeting is healthcheck
related as you originally suggested.

what I see is the number of FDs creep up over time. as you said
originally it will dip back down again every few minutes as FDs are
released. but I see the low-watermark creeping up over time i.e. the
low-watermark starts around 300 or so but after enough time it starts to
creep up beyond 500 and just grow beyond that.

to be 100% sure when the low-watermark reached 708 I then stopped jicofo
to kill off further healthchecks and the FDs number still stayed static
at 708 even after 10+ mins. this seems like evidence of a leak to me.

I have replicated this now both on Debian Jessie and Ubuntu 14.04 (both
64 bit).

I am going to experiment with File Leak Detector
(http://file-leak-detector.kohsuke.org/) if I get time.

I observe the same thing, except that over time and with more health
checks requests the number drops back down. Specifically, I run a few
thousand health checks (via REST) and although it reached 1500 at some
point, it never reached the ulimit and went back to around 300.

in what timeframe was this 1500 healthchecks? I just tried to reproduce
your approach on a test server (Ubuntu 14.04) of calling the healthcheck
via the REST API.

I enabled the REST API and wrote a small bash script script to hit
http://localhost:8080/about/health in a loop every 3 seconds with curl (
http://pastie.org/10848164 <http://pastie.org/10846053>) for 10,000
iterations (I'm not even running jicofo on this particular machine).

as you say I initially see the FDs number ramping up to > 1500 and then
periodically falling back down to the 300s.

but as that cycle repeats over I can see the low watermark increasing to
the high 300s and then hit the 400s and slowly increase in that manner.
the leak is reasonably slow but with the above experiment running for an
hour (say). I can see the low-watermark is now > 500.

this slow increasing of the low watermark is what I am defining as the
leak. after X hours that low watermark will be > 1000. after 4X hours it
will be > 4000 and eventually hit the ulimit.

it would be great if you could try my bash script for uniformity of
approach and leave it running for a few hours to see what your low
watermark is after several hours. if you see something different from me
then I'd have to asssume that something is different at an OS level and in
which case it would be great to esbalish what it is. can you please share
details of what OS and JRE you are running on? a bonus would be your
"sysctl -a" settings. I'm wondering if some kernel setting is responsible
for our differing experiences.

I'm going to leave my version of the the experiment overnight and will
email you the resulting jvb.log at the end.

I don't actually understand what these fds are, but my best guess is that
the jvb or one of the underlying libraries caches them for reuse. So, I
don't think that this causes a problem.

We did observe a few instances in which the bridge hit the ulimit ("too
many open files" errors came up from various places), but in these cases,
in addition to the "can't identify protocol" fds we saw many leaked sockets
in CLOSE_WAIT.

Do you still observe the ulimit being reached? If so, could you share your
bridge log file and the output of lsof?

I have 2 macines both running Debian Jessie with latest openjdk 8 (amd64).
I changed one to healthcheck every 20 seconds and left the other at the
default (10 seconds). when I checked on them after 1 day the 10 second
healthcheck machine had > 2400 FDs. the other had roughly half that.

I can't afford the loss of service to allow it to reach the ulimit (4096)
but from past experience I know that will happen. the fact that one
machine has reached > 2400 in one day of default (10 second) healthchecks
is sufficient evidence to me that it is inevitably going to happen.

when I checked on pidof I see the lions share of them are "can't identify
protocol". I don't see any CLOSE_WAIT. the "can't identify protocol" FDs
have no footprint on /proc/$pid which implies the kernel has washed its
hands of them or they were never properly used. there is some discussion
of FDs in this state here: http://serverfault.com/a/155560 I actually like
their idea of using byteman but I'm not exactly sure how to. I did play
around with File Leak Detector but no dice - it only deals with files not
sockets.

Thanks,
RD

···

On Sat, May 21, 2016 at 2:03 AM, Boris Grozev <boris@jitsi.org> wrote:

On 20/05/16 19:33, Raoul Duke wrote:

On Fri, May 20, 2016 at 7:38 PM, Boris Grozev <boris@jitsi.org >> <mailto:boris@jitsi.org>> wrote:

    Hi again,

    After more investigation we did find a leak. The fds opened as a
    result of health checks (the ones listed by lsof as "can't identify
    protocol") get closed eventually, so they aren't a problem. But we
    were leaking sockets from the ice/tcp implementation.

    This should be fixed now (after this is merged:
    https://github.com/jitsi/jitsi-videobridge/pull/246).

    This issue has been there for a long time. We're not sure why it
    started to occur more often just now. It could be some change in
    browsers (i.e. more firefox users running with ice/tcp).

    If you want to verify that this is the issue you were running into,
    grep for "Not adding a socket to an ICE agent with" in your old
    videobridge logs. Each such line indicates a leaked socket.

    Regards,
    Boris

    On 18/05/16 13:08, Raoul Duke wrote:

        Hi Boris,

        On Wed, May 18, 2016 at 6:35 PM, Boris Grozev <boris@jitsi.org >> <mailto:boris@jitsi.org> >> <mailto:boris@jitsi.org <mailto:boris@jitsi.org>>> wrote:

             On 18/05/16 12:09, David Dumonde wrote:

                 Boris

                 Thank you for your help. I followed your instructions
        to insert

                      org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=-1

                 into /etc/jitsi/jicofo/sip-communicator.properties, but I
                 continue to
                 have log entries added to jvb.log every 10 seconds,
        even with no
                 activity on my Jitsi Meet installation. The log entries
        look
                 like this:

                      JVB 2016-05-18 11:58:45.877 INFO: [32]
        org.jitsi.videobridge.xmpp.ComponentImpl.info
        <http://org.jitsi.videobridge.xmpp.ComponentImpl.info>

        <http://org.jitsi.videobridge.xmpp.ComponentImpl.info>() RECV:
<iq

                      id="V4NJM-16" type="result"
                      to="jitsi-videobridge.meet.daviddumonde.com
        <http://jitsi-videobridge.meet.daviddumonde.com>
                 <http://jitsi-videobridge.meet.daviddumonde.com>
                      <http://jitsi-videobridge.meet.daviddumonde.com>"

             These are likely replies to pings sent from the bridge
             (unfortunately we don't log the requests).

        I might have figured out why I'm having the issue on this
particular
        machine. In the logfile snippet I sent you I noticed that there
        were 2
        distinct IPs referenced in the
"org.ice4j.ice.Agent.createComponent"
        logging lines. It turns out that particular machine has a second
        virtual IP (eth0:1) which explains the second IP (i.e. the eth0
and
        eth0:1 public IPs both go to the same physical machine).

        I contrasted this with the machine not experiencing the leak
        which only
        has a single NIC (eth0) and I wondered if the leak could be due
        to some
        circular problem relating to the 2 IPs reaching the same endpoint.

        anyway, since I am not actively using the eth0:1 IP on the problem
        machine I disabled it and now I don't believe I am seeing the
        same leakage.

        I will continue to monitor it and report back in a day or so if it
        continues to be stable. I'm not sure if this is a bug or not.
        it seems
        like it could happen to anyone on a machine with multiple NICs
        and in
        that regard does not seem like user error. I'm not sure of the
        nuts and
        bolts of why the secondary IP is problematic. perhaps it would
        be good
        if the IPs which are used by JVB could be overridden. I'm not
sure.

        Thanks,
        RD

        _______________________________________________
        users mailing list
        users@jitsi.org <mailto:users@jitsi.org>
        Unsubscribe instructions and other list options:
        http://lists.jitsi.org/mailman/listinfo/users

    _______________________________________________
    users mailing list
    users@jitsi.org <mailto:users@jitsi.org>
    Unsubscribe instructions and other list options:
    http://lists.jitsi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/users


#10

        Hi Boris,

        Thanks for looking into this.

          From my side it looks like the leak I am seeing is still
        present. I
        am not seeing the logging line you mentioned ("Not adding a
        socket to an
        ICE agent with") in my current or historic logs so perhaps I am
        hitting
        on a different issue than the one you have addressed.

        I now believe that my discovery about the second (virtual) IP was
        contributing only to the leak happeneing faster but I still see
        the leak
        without the second IP, albeit at a slower rate.

        I just tried videobridge 737 (package) which appears to include your
        change (although I'm not 100% sure).

        here is my approach to replicating the issue I am seeing:
        * use latest jitsi-videobridge and jicofo (amd64) DEBs from here
        https://download.jitsi.org/jitsi/debian/?C=M;O=D
        * I'm using ejabberd as my XMPP server in case that could be
        relevant
        (although it seems that the leak must be ICE connection related
        rather
        than XMPP right?)
        * in jicofo config I increased the frequency of healthchecks to
        every 4
        seconds (org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=4000) just to be
        more
        aggressive
        * if rhe machine has multiple IPs this seems to also contribute
        to the
        leak happeneing faster
        * I am running the following command in a terminal window to
        watch the
        number of FDs

        watch "lsof -u jvb | wc -l"

        I then just watch and wait. there is no traffic hitting the
        videobridge
        so I can only conclude that the problem I am seeting is healthcheck
        related as you originally suggested.

        what I see is the number of FDs creep up over time. as you said
        originally it will dip back down again every few minutes as FDs are
        released. but I see the low-watermark creeping up over time
        i.e. the
        low-watermark starts around 300 or so but after enough time it
        starts to
        creep up beyond 500 and just grow beyond that.

        to be 100% sure when the low-watermark reached 708 I then
        stopped jicofo
        to kill off further healthchecks and the FDs number still stayed
        static
        at 708 even after 10+ mins. this seems like evidence of a leak
        to me.

        I have replicated this now both on Debian Jessie and Ubuntu
        14.04 (both
        64 bit).

        I am going to experiment with File Leak Detector
        (http://file-leak-detector.kohsuke.org/) if I get time.

    I observe the same thing, except that over time and with more health
    checks requests the number drops back down. Specifically, I run a
    few thousand health checks (via REST) and although it reached 1500
    at some point, it never reached the ulimit and went back to around 300.

in what timeframe was this 1500 healthchecks? I just tried to reproduce
your approach on a test server (Ubuntu 14.04) of calling the healthcheck
via the REST API.

I enabled the REST API and wrote a small bash script script to hit
http://localhost:8080/about/health in a loop every 3 seconds with curl
  (http://pastie.org/10848164 <http://pastie.org/10846053>) for 10,000
iterations (I'm not even running jicofo on this particular machine).

as you say I initially see the FDs number ramping up to > 1500 and then
periodically falling back down to the 300s.

but as that cycle repeats over I can see the low watermark increasing to
the high 300s and then hit the 400s and slowly increase in that manner.
  the leak is reasonably slow but with the above experiment running for
an hour (say). I can see the low-watermark is now > 500.

this slow increasing of the low watermark is what I am defining as the
leak. after X hours that low watermark will be > 1000. after 4X hours
it will be > 4000 and eventually hit the ulimit.

I was just running curl in a loop, without any pausing. Will also let your script run during the night.

it would be great if you could try my bash script for uniformity of
approach and leave it running for a few hours to see what your low
watermark is after several hours. if you see something different from
me then I'd have to asssume that something is different at an OS level
and in which case it would be great to esbalish what it is. can you
please share details of what OS and JRE you are running on? a bonus
would be your "sysctl -a" settings. I'm wondering if some kernel
setting is responsible for our differing experiences.

Ubuntu 14.04, linux 3.13.0-61-generic, java 1.7.0_79.
sysctl -a output: http://pastie.org/10848177

I'm going to leave my version of the the experiment overnight and will
email you the resulting jvb.log at the end.

Thanks. Please also attach the output of lsof.

Boris

···

On 22/05/16 19:53, Raoul Duke wrote:

On Sat, May 21, 2016 at 2:03 AM, Boris Grozev <boris@jitsi.org > <mailto:boris@jitsi.org>> wrote:
    On 20/05/16 19:33, Raoul Duke wrote:


#11

PS - the correct URL for the repeatign healthcheck script is:

http://pastie.org/10848164

(it needs to be run as root).

···

On Mon, May 23, 2016 at 1:53 AM, Raoul Duke <rduke496@gmail.com> wrote:

On Sat, May 21, 2016 at 2:03 AM, Boris Grozev <boris@jitsi.org> wrote:

On 20/05/16 19:33, Raoul Duke wrote:

Hi Boris,

Thanks for looking into this.

From my side it looks like the leak I am seeing is still present. I
am not seeing the logging line you mentioned ("Not adding a socket to an
ICE agent with") in my current or historic logs so perhaps I am hitting
on a different issue than the one you have addressed.

I now believe that my discovery about the second (virtual) IP was
contributing only to the leak happeneing faster but I still see the leak
without the second IP, albeit at a slower rate.

I just tried videobridge 737 (package) which appears to include your
change (although I'm not 100% sure).

here is my approach to replicating the issue I am seeing:
* use latest jitsi-videobridge and jicofo (amd64) DEBs from here
https://download.jitsi.org/jitsi/debian/?C=M;O=D
* I'm using ejabberd as my XMPP server in case that could be relevant
(although it seems that the leak must be ICE connection related rather
than XMPP right?)
* in jicofo config I increased the frequency of healthchecks to every 4
seconds (org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=4000) just to be more
aggressive
* if rhe machine has multiple IPs this seems to also contribute to the
leak happeneing faster
* I am running the following command in a terminal window to watch the
number of FDs

watch "lsof -u jvb | wc -l"

I then just watch and wait. there is no traffic hitting the videobridge
so I can only conclude that the problem I am seeting is healthcheck
related as you originally suggested.

what I see is the number of FDs creep up over time. as you said
originally it will dip back down again every few minutes as FDs are
released. but I see the low-watermark creeping up over time i.e. the
low-watermark starts around 300 or so but after enough time it starts to
creep up beyond 500 and just grow beyond that.

to be 100% sure when the low-watermark reached 708 I then stopped jicofo
to kill off further healthchecks and the FDs number still stayed static
at 708 even after 10+ mins. this seems like evidence of a leak to me.

I have replicated this now both on Debian Jessie and Ubuntu 14.04 (both
64 bit).

I am going to experiment with File Leak Detector
(http://file-leak-detector.kohsuke.org/) if I get time.

I observe the same thing, except that over time and with more health
checks requests the number drops back down. Specifically, I run a few
thousand health checks (via REST) and although it reached 1500 at some
point, it never reached the ulimit and went back to around 300.

in what timeframe was this 1500 healthchecks? I just tried to reproduce
your approach on a test server (Ubuntu 14.04) of calling the healthcheck
via the REST API.

I enabled the REST API and wrote a small bash script script to hit
http://localhost:8080/about/health in a loop every 3 seconds with curl (
http://pastie.org/10848164 <http://pastie.org/10846053>) for 10,000
iterations (I'm not even running jicofo on this particular machine).

as you say I initially see the FDs number ramping up to > 1500 and then
periodically falling back down to the 300s.

but as that cycle repeats over I can see the low watermark increasing to
the high 300s and then hit the 400s and slowly increase in that manner.
the leak is reasonably slow but with the above experiment running for an
hour (say). I can see the low-watermark is now > 500.

this slow increasing of the low watermark is what I am defining as the
leak. after X hours that low watermark will be > 1000. after 4X hours it
will be > 4000 and eventually hit the ulimit.

it would be great if you could try my bash script for uniformity of
approach and leave it running for a few hours to see what your low
watermark is after several hours. if you see something different from me
then I'd have to asssume that something is different at an OS level and in
which case it would be great to esbalish what it is. can you please share
details of what OS and JRE you are running on? a bonus would be your
"sysctl -a" settings. I'm wondering if some kernel setting is responsible
for our differing experiences.

I'm going to leave my version of the the experiment overnight and will
email you the resulting jvb.log at the end.

I don't actually understand what these fds are, but my best guess is that
the jvb or one of the underlying libraries caches them for reuse. So, I
don't think that this causes a problem.

We did observe a few instances in which the bridge hit the ulimit ("too
many open files" errors came up from various places), but in these cases,
in addition to the "can't identify protocol" fds we saw many leaked sockets
in CLOSE_WAIT.

Do you still observe the ulimit being reached? If so, could you share
your bridge log file and the output of lsof?

I have 2 macines both running Debian Jessie with latest openjdk 8
(amd64). I changed one to healthcheck every 20 seconds and left the other
at the default (10 seconds). when I checked on them after 1 day the 10
second healthcheck machine had > 2400 FDs. the other had roughly half that.

I can't afford the loss of service to allow it to reach the ulimit (4096)
but from past experience I know that will happen. the fact that one
machine has reached > 2400 in one day of default (10 second) healthchecks
is sufficient evidence to me that it is inevitably going to happen.

when I checked on pidof I see the lions share of them are "can't identify
protocol". I don't see any CLOSE_WAIT. the "can't identify protocol" FDs
have no footprint on /proc/$pid which implies the kernel has washed its
hands of them or they were never properly used. there is some discussion
of FDs in this state here: http://serverfault.com/a/155560 I actually
like their idea of using byteman but I'm not exactly sure how to. I did
play around with File Leak Detector but no dice - it only deals with files
not sockets.

Thanks,
RD

On Fri, May 20, 2016 at 7:38 PM, Boris Grozev <boris@jitsi.org >>> <mailto:boris@jitsi.org>> wrote:

    Hi again,

    After more investigation we did find a leak. The fds opened as a
    result of health checks (the ones listed by lsof as "can't identify
    protocol") get closed eventually, so they aren't a problem. But we
    were leaking sockets from the ice/tcp implementation.

    This should be fixed now (after this is merged:
    https://github.com/jitsi/jitsi-videobridge/pull/246).

    This issue has been there for a long time. We're not sure why it
    started to occur more often just now. It could be some change in
    browsers (i.e. more firefox users running with ice/tcp).

    If you want to verify that this is the issue you were running into,
    grep for "Not adding a socket to an ICE agent with" in your old
    videobridge logs. Each such line indicates a leaked socket.

    Regards,
    Boris

    On 18/05/16 13:08, Raoul Duke wrote:

        Hi Boris,

        On Wed, May 18, 2016 at 6:35 PM, Boris Grozev <boris@jitsi.org >>> <mailto:boris@jitsi.org> >>> <mailto:boris@jitsi.org <mailto:boris@jitsi.org>>> wrote:

             On 18/05/16 12:09, David Dumonde wrote:

                 Boris

                 Thank you for your help. I followed your instructions
        to insert

                      org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=-1

                 into /etc/jitsi/jicofo/sip-communicator.properties, but
I
                 continue to
                 have log entries added to jvb.log every 10 seconds,
        even with no
                 activity on my Jitsi Meet installation. The log entries
        look
                 like this:

                      JVB 2016-05-18 11:58:45.877 INFO: [32]
        org.jitsi.videobridge.xmpp.ComponentImpl.info
        <http://org.jitsi.videobridge.xmpp.ComponentImpl.info>

        <http://org.jitsi.videobridge.xmpp.ComponentImpl.info>() RECV:
<iq

                      id="V4NJM-16" type="result"
                      to="jitsi-videobridge.meet.daviddumonde.com
        <http://jitsi-videobridge.meet.daviddumonde.com>
                 <http://jitsi-videobridge.meet.daviddumonde.com>
                      <http://jitsi-videobridge.meet.daviddumonde.com>"

             These are likely replies to pings sent from the bridge
             (unfortunately we don't log the requests).

        I might have figured out why I'm having the issue on this
particular
        machine. In the logfile snippet I sent you I noticed that there
        were 2
        distinct IPs referenced in the
"org.ice4j.ice.Agent.createComponent"
        logging lines. It turns out that particular machine has a second
        virtual IP (eth0:1) which explains the second IP (i.e. the eth0
and
        eth0:1 public IPs both go to the same physical machine).

        I contrasted this with the machine not experiencing the leak
        which only
        has a single NIC (eth0) and I wondered if the leak could be due
        to some
        circular problem relating to the 2 IPs reaching the same
endpoint.

        anyway, since I am not actively using the eth0:1 IP on the
problem
        machine I disabled it and now I don't believe I am seeing the
        same leakage.

        I will continue to monitor it and report back in a day or so if
it
        continues to be stable. I'm not sure if this is a bug or not.
        it seems
        like it could happen to anyone on a machine with multiple NICs
        and in
        that regard does not seem like user error. I'm not sure of the
        nuts and
        bolts of why the secondary IP is problematic. perhaps it would
        be good
        if the IPs which are used by JVB could be overridden. I'm not
sure.

        Thanks,
        RD

        _______________________________________________
        users mailing list
        users@jitsi.org <mailto:users@jitsi.org>
        Unsubscribe instructions and other list options:
        http://lists.jitsi.org/mailman/listinfo/users

    _______________________________________________
    users mailing list
    users@jitsi.org <mailto:users@jitsi.org>
    Unsubscribe instructions and other list options:
    http://lists.jitsi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/users


#12

I attached my findings to the github issue:
https://github.com/jitsi/jitsi-videobridge/issues/245

I will run a longer test tonght.

Thanks,

···

On Mon, May 23, 2016 at 2:11 AM, Boris Grozev <boris@jitsi.org> wrote:

On 22/05/16 19:53, Raoul Duke wrote:

On Sat, May 21, 2016 at 2:03 AM, Boris Grozev <boris@jitsi.org >> <mailto:boris@jitsi.org>> wrote:

    On 20/05/16 19:33, Raoul Duke wrote:

        Hi Boris,

        Thanks for looking into this.

          From my side it looks like the leak I am seeing is still
        present. I
        am not seeing the logging line you mentioned ("Not adding a
        socket to an
        ICE agent with") in my current or historic logs so perhaps I am
        hitting
        on a different issue than the one you have addressed.

        I now believe that my discovery about the second (virtual) IP was
        contributing only to the leak happeneing faster but I still see
        the leak
        without the second IP, albeit at a slower rate.

        I just tried videobridge 737 (package) which appears to include
your
        change (although I'm not 100% sure).

        here is my approach to replicating the issue I am seeing:
        * use latest jitsi-videobridge and jicofo (amd64) DEBs from here
        https://download.jitsi.org/jitsi/debian/?C=M;O=D
        * I'm using ejabberd as my XMPP server in case that could be
        relevant
        (although it seems that the leak must be ICE connection related
        rather
        than XMPP right?)
        * in jicofo config I increased the frequency of healthchecks to
        every 4
        seconds (org.jitsi.jicofo.HEALTH_CHECK_INTERVAL=4000) just to be
        more
        aggressive
        * if rhe machine has multiple IPs this seems to also contribute
        to the
        leak happeneing faster
        * I am running the following command in a terminal window to
        watch the
        number of FDs

        watch "lsof -u jvb | wc -l"

        I then just watch and wait. there is no traffic hitting the
        videobridge
        so I can only conclude that the problem I am seeting is
healthcheck
        related as you originally suggested.

        what I see is the number of FDs creep up over time. as you said
        originally it will dip back down again every few minutes as FDs
are
        released. but I see the low-watermark creeping up over time
        i.e. the
        low-watermark starts around 300 or so but after enough time it
        starts to
        creep up beyond 500 and just grow beyond that.

        to be 100% sure when the low-watermark reached 708 I then
        stopped jicofo
        to kill off further healthchecks and the FDs number still stayed
        static
        at 708 even after 10+ mins. this seems like evidence of a leak
        to me.

        I have replicated this now both on Debian Jessie and Ubuntu
        14.04 (both
        64 bit).

        I am going to experiment with File Leak Detector
        (http://file-leak-detector.kohsuke.org/) if I get time.

    I observe the same thing, except that over time and with more health
    checks requests the number drops back down. Specifically, I run a
    few thousand health checks (via REST) and although it reached 1500
    at some point, it never reached the ulimit and went back to around
300.

in what timeframe was this 1500 healthchecks? I just tried to reproduce
your approach on a test server (Ubuntu 14.04) of calling the healthcheck
via the REST API.

I enabled the REST API and wrote a small bash script script to hit
http://localhost:8080/about/health in a loop every 3 seconds with curl
  (http://pastie.org/10848164 <http://pastie.org/10846053>) for 10,000
iterations (I'm not even running jicofo on this particular machine).

as you say I initially see the FDs number ramping up to > 1500 and then
periodically falling back down to the 300s.

but as that cycle repeats over I can see the low watermark increasing to
the high 300s and then hit the 400s and slowly increase in that manner.
  the leak is reasonably slow but with the above experiment running for
an hour (say). I can see the low-watermark is now > 500.

this slow increasing of the low watermark is what I am defining as the
leak. after X hours that low watermark will be > 1000. after 4X hours
it will be > 4000 and eventually hit the ulimit.

I was just running curl in a loop, without any pausing. Will also let your
script run during the night.

it would be great if you could try my bash script for uniformity of
approach and leave it running for a few hours to see what your low
watermark is after several hours. if you see something different from
me then I'd have to asssume that something is different at an OS level
and in which case it would be great to esbalish what it is. can you
please share details of what OS and JRE you are running on? a bonus
would be your "sysctl -a" settings. I'm wondering if some kernel
setting is responsible for our differing experiences.

Ubuntu 14.04, linux 3.13.0-61-generic, java 1.7.0_79.
sysctl -a output: http://pastie.org/10848177

I'm going to leave my version of the the experiment overnight and will
email you the resulting jvb.log at the end.

Thanks. Please also attach the output of lsof.