[jitsi-dev] Jitsi videobridge last-n behavior


#1

I'm noticing some issues when using last-n on the videobridge. I have
last-n set to 2 and am seeing the following (all clients join muted):

1) First client joins, consistently receives empty last-n notification
message
2) Second client joins, first client appears to consistently receive a
last-n notification containing the id of the second client
3) Third client joins, first client *mostly* appars to get a last-n
message, but I have seen instances where no new message is sent. This
causes a problem because the last-n message is the trigger to attach the
video stream.
4) A fourth client joins, no new last-n message sent (none expected, client
joins muted and client is already receiving 2 streams: client 2 and client
3)
5) Either client 2 or client 3 leaves. The bridge correctly forwards a new
video stream to client 1 (since one of its 2 last-n streams is now gone),
but no last-n notification is received, so client 1 doesn't know which
stream it has begun to receive data on (in this example obviously there's
only one possibility, but that would not be true for a larger call).

My next step is to take dive into the bridge and take a look as to why the
automatic last-n messages that cover the cases when streams are
auto-forwarded to fill last-n initially, but not when there's a gap due to
someone leaving. In the meantime, has anyone else seen this behavior?

-brian


#2

Hello, Brian!

Thank you very much for the feedback!

I'm noticing some issues when using last-n on the videobridge. I have
last-n set to 2 and am seeing the following (all clients join muted):

1) First client joins, consistently receives empty last-n notification
message
2) Second client joins, first client appears to consistently receive a
last-n notification containing the id of the second client
3) Third client joins, first client *mostly* appars to get a last-n message,
but I have seen instances where no new message is sent. This causes a
problem because the last-n message is the trigger to attach the video
stream.

Our intent is to always send a message to the first client. If no such
message is sent by Videobridge, the behavior is a bug.

4) A fourth client joins, no new last-n message sent (none expected, client
joins muted and client is already receiving 2 streams: client 2 and client
3)
5) Either client 2 or client 3 leaves. The bridge correctly forwards a new
video stream to client 1 (since one of its 2 last-n streams is now gone),
but no last-n notification is received, so client 1 doesn't know which
stream it has begun to receive data on (in this example obviously there's
only one possibility, but that would not be true for a larger call).

Our intent is to always send notifications about changes to the list
of last n should an element leave the conference. If no such message
is sent by Videobridge, the behavior is a bug.

My next step is to take dive into the bridge and take a look as to why the
automatic last-n messages that cover the cases when streams are
auto-forwarded to fill last-n initially, but not when there's a gap due to
someone leaving. In the meantime, has anyone else seen this behavior?

Should you decide to go on and fix the issue, we'll gladly work with
you on integrating your contributions.

Best regards,
Lyubomir Marinov

···

2015-08-03 21:15 GMT-05:00 Brian Baldino <brian@highfive.com>:


#3

Thanks, Lyubomir. I've found that the bridge does fire updates to the
last-n notifications on the ENDPOINTS_CHANGED event (which should fire when
endpoints join or leave) so it certainly looks like the bridge has the
right handling. As I've been looking closer at the bridge, I'm starting to
wonder if this is due to issues with the sctp connection between the bridge
and endpoint getting close prematurely for some reason. I think this is
something you guys have said you had heard of and perhaps seen before?

From what I can see in my testing, it looks like the data channel is

getting expired. I've got a theory that I think I've been able to verify
in the logs, what I see happening is this:

1) Channels for content 'data' in the bridge are 'touched' based on the
consent freshness checks, it appears that these happen every 15 seconds.
2) Channels are typically set to a 10 second expiry (believe that's the
default setting and what jitsi-meet uses, we're using it as well)
3) The VideobridgeExpireThread is set to check every 60 seconds.
4) With the times above, we can run into an issue where the
VideobridgeExpireThread can run during a period where the data channel
freshness check has not occurred within 10 seconds (since they only occur
every 15 seconds). This does not happen all the time, I had to try quite a
few times to get "lucky" with the timing to exhibit this. I've attached
some videobridge logs where this happens in the first check and the channel
gets expired. The data channel gets touched here:

2015-07-05 00:05:54.852 INFO: [30] org.jitsi.videobridge.Channel.info() BB:
touching data channel 9828d674424d074e for endpoint
d513fb7e-95d4-4902-83ad-d5dbe9f87fbd

And the VideobridgeExpireThread runs here:
2015-07-05 00:06:06.267 INFO: [13]
org.jitsi.videobridge.VideobridgeExpireThread.info() BB: Looking to expire
channel 9828d674424d074e content type data in conference 3e9d1b2b5d1abd4,
last activity time: 1438733154852, expiration time: 10000, current time:
1438733166267

Which is just outside the 10 second window, so the channel gets expired.

Since the freshness update interval divides evenly into a minute, it's
either sync'd with the VideobridgeExpireThread (the expire thread will
either always see it as being touched within the last 10 seconds) or is
isn't (it will expire it right away), so that helps contribute to the
rarity of this I think. However, I wonder if it's possible that either the
expire thread or the freshness check can be delayed or break its consistent
interval, meaning this could possibly occur later in a call as well.

Does this theory seem plausible? I'll try and do more testing to see if I
can get more data to back this up from my side as well.

-brian

expire_datachannel_right_away (29.8 KB)

···

On Tue, Aug 4, 2015 at 11:58 AM, Lyubomir Marinov < lyubomir.marinov@jitsi.org> wrote:

Hello, Brian!

Thank you very much for the feedback!

2015-08-03 21:15 GMT-05:00 Brian Baldino <brian@highfive.com>:
> I'm noticing some issues when using last-n on the videobridge. I have
> last-n set to 2 and am seeing the following (all clients join muted):
>
> 1) First client joins, consistently receives empty last-n notification
> message
> 2) Second client joins, first client appears to consistently receive a
> last-n notification containing the id of the second client
> 3) Third client joins, first client *mostly* appars to get a last-n
message,
> but I have seen instances where no new message is sent. This causes a
> problem because the last-n message is the trigger to attach the video
> stream.

Our intent is to always send a message to the first client. If no such
message is sent by Videobridge, the behavior is a bug.

> 4) A fourth client joins, no new last-n message sent (none expected,
client
> joins muted and client is already receiving 2 streams: client 2 and
client
> 3)
> 5) Either client 2 or client 3 leaves. The bridge correctly forwards a
new
> video stream to client 1 (since one of its 2 last-n streams is now gone),
> but no last-n notification is received, so client 1 doesn't know which
> stream it has begun to receive data on (in this example obviously there's
> only one possibility, but that would not be true for a larger call).

Our intent is to always send notifications about changes to the list
of last n should an element leave the conference. If no such message
is sent by Videobridge, the behavior is a bug.

> My next step is to take dive into the bridge and take a look as to why
the
> automatic last-n messages that cover the cases when streams are
> auto-forwarded to fill last-n initially, but not when there's a gap due
to
> someone leaving. In the meantime, has anyone else seen this behavior?

Should you decide to go on and fix the issue, we'll gladly work with
you on integrating your contributions.

Best regards,
Lyubomir Marinov

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev


#4

Hi Brian,

2) Channels are typically set to a 10 second expiry (believe that's the
default setting and what jitsi-meet uses, we're using it as well)

Nope we leave expire attributes untouched and the default of 60
seconds is used. Jicofo expires channels by setting 'expire' to 0 when
participant leaves the conference.

Does this theory seem plausible? I'll try and do more testing to see if I
can get more data to back this up from my side as well.

It makes sense, but how about trying to use 60 sec ?

Regards,
Pawel

···

On Wed, Aug 5, 2015 at 2:19 AM, Brian Baldino <brian@highfive.com> wrote:


#5

Thanks Pawel, inline.

Hi Brian,

> 2) Channels are typically set to a 10 second expiry (believe that's the
> default setting and what jitsi-meet uses, we're using it as well)

Nope we leave expire attributes untouched and the default of 60
seconds is used. Jicofo expires channels by setting 'expire' to 0 when
participant leaves the conference.

Right you are, I had checked my local jitsi-meet deployment and thought I
saw the expiry at 10 there too, but I just double checked and it is indeed
60. Sounds like this is something specific to our deployment then. That
being said, we'd certainly prefer a timeout faster than 60 seconds, is the
STUN freshness check interval configurable in any way? 15 seconds may not
be that bad, but something around 10 would feel better for us I think.

> Does this theory seem plausible? I'll try and do more testing to see if
I
> can get more data to back this up from my side as well.

It makes sense, but how about trying to use 60 sec ?

Planning on bumping this up to 20 and checking to see if that solves the
data channel issues we've been seeing.

···

On Wed, Aug 5, 2015 at 5:49 AM, Paweł Domas <pawel.domas@jitsi.org> wrote:

On Wed, Aug 5, 2015 at 2:19 AM, Brian Baldino <brian@highfive.com> wrote:

Regards,
Pawel

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev


#6

Cool, thanks Lyubo!

···

On Wed, Aug 5, 2015 at 11:35 AM, Lyubomir Marinov < lyubomir.marinov@jitsi.org> wrote:

2015-08-05 10:57 GMT-05:00 Brian Baldino <brian@highfive.com>:
> is the
> STUN freshness check interval configurable in any way?

Hey, Brian!

Unfortunately, the consent freshness check interval does not appear to
be configurable at this time
(
https://github.com/jitsi/ice4j/blob/master/src/main/java/org/ice4j/ice/Agent.java#L2286
).

However, it should be relatively simple and straightforward to add
such configurability and we'll gladly work to integrate contributions
on the subject.

Best regards,
Lyubo Marinov

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev


#7

Hey, Brian!

Unfortunately, the consent freshness check interval does not appear to
be configurable at this time
(https://github.com/jitsi/ice4j/blob/master/src/main/java/org/ice4j/ice/Agent.java#L2286).

However, it should be relatively simple and straightforward to add
such configurability and we'll gladly work to integrate contributions
on the subject.

Best regards,
Lyubo Marinov

···

2015-08-05 10:57 GMT-05:00 Brian Baldino <brian@highfive.com>:

is the
STUN freshness check interval configurable in any way?