[jitsi-dev] [jitsi-videobridge] conference creation fails under load due to port contention (#96)


#1

Hello,

IceUdpTransportManager attempts to bind ports from a "tracked" starting value, in code that starts here:

https://github.com/jitsi/jitsi-videobridge/blob/master/src/main/java/org/jitsi/videobridge/IceUdpTransportManager.java#L724

The problem is that if multiple threads enter this code in parallel, all can start the search from the same value. Under sustained conference creation load, this ultimately leads to a bind failure, because there is an inner retry loop that fails after 50 attempts (it can be bumped to 100, but that 100 limit is hardcoded in NetworkAddressManagerServiceImpl.createIceStream). If the retry loop is unsuccessful, conference creation fails.

I can get a failure with a conference creation rate of 9/sec over 10 seconds on an EC2 m4.large instance.

Are there any plans to improve this port assignment code? We are experimenting with a synchronized block around the portTracker code mentioned above. That eliminates the errors, but results in a port assignment time that is a order of magnitude slower under load. We may consider using multiple port trackers or some kind of port free list, but we wanted to see if anyone else had run into this.

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/96


#2

Not sure about a proper long-term solution, but a workaround is to use the [single-port mode](https://github.com/jitsi/jitsi-videobridge/blob/master/doc/single-port.md), in which case the portBase value will be (effectively) not used.

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/96#issuecomment-155207079


#3

Thanks @bgrozev, we'll check out this mode.

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/96#issuecomment-155221121


#4

@bgrozev the last time I looked the single port harvester just added this additional port. Is there a way to disable anything but that port?

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/96#issuecomment-155221662


#5

@fippo that's pretty much the default behavior actually. The harvester itself only adds a port, but there is [logic in the bridge](https://github.com/jitsi/jitsi-videobridge/blob/master/src/main/java/org/jitsi/videobridge/IceUdpTransportManager.java#L448) which disables the dynamically allocated candidates if single-port is in use. This way we only use the single port for browsers, but use dynamic ports for endpoints without rtcpmux support (i.e. jigasi).

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/96#issuecomment-155227105


#6

@bgrozev hah, that looks much better than what I remembered. Thanks!

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/96#issuecomment-155228446


#7

https://github.com/jitsi/jitsi-videobridge/commit/d6610fd6f11ef89f6cf738c8d7ea628d8bd794cb provides a fix for a related issue (which may have actually cause what you observed). The TCP and/or "single-port" port were used when updating the value, which resulted in it always staying at "minPort". The race condition is still there, but I believe it is very unlikely to cause any problems in practice.

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/96#issuecomment-174094703


#8

Closed #96.

···

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/96#event-1021636259


#9

Closing because no further problems have been reported for over a year. Please reopen if necessary.

···

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/96#issuecomment-290300396


#10

We moved to single port mode, which has worked mostly fine for us and doesn't have this port-search issue.

We did have to increase the socket OS receive buffer sizes for the single port harvester. With the default limits, we were saturating these buffers and this led to audio drop outs. The settings we now use are as follows:
sysctl -w net.core.rmem_default=20971520
sysctl -w net.core.rmem_max=33554432
sysctl -w net.core.wmem_default=65536
sysctl -w net.core.wmem_max=33554432

···

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/96#issuecomment-290457863


#11

We moved to single port mode, which has worked mostly fine for us and
doesn't have this port-search issue.

We did have to increase the socket OS receive buffer sizes for the single
port harvester. With the default limits, we were saturating these buffers
and this led to audio drop outs. The settings we now use are as follows:
sysctl -w net.core.rmem_default=20971520

Note that you may want to keep the system-wide default lower. We have a
java property you can use to control the receive buffer size only for the
single-port mode:

org.ice4j.ice.harvest.AbstractUdpListener.SO_RCVBUF

Boris

···

On Thu, Mar 30, 2017 at 11:01 AM John Quigley <notifications@github.com> wrote:

sysctl -w net.core.rmem_max=33554432
sysctl -w net.core.wmem_default=65536
sysctl -w net.core.wmem_max=33554432


You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub
<https://github.com/jitsi/jitsi-videobridge/issues/96#issuecomment-290457863>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADHQu-Rm4vSFZrG8dvTY0dSEMPLWWT1Wks5rq9HagaJpZM4Ge7Jp>
.

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/96#issuecomment-290512470


#12

Thanks, thats a good tip. Perhaps it could be added to the official documentation, here: https://github.com/jitsi/jitsi-videobridge/blob/master/doc/single-port.md
I'm not sure what the recommended value would be. We chose 20MB based on empirical tests with bots and how much memory our systems had available.

We've also experimented with a custom modification where we have multiple single port harvesters (i.e one per core), on different ports. The idea being that each SPH can use a smaller buffer size and gets its own IO input thread. However, we've never observed any benefit from that, except for perhaps a small improvement in ice candidate selection (less TCP and more direct and TURN-UDP in an videobridge under load.)

···

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/96#issuecomment-290537396