[jitsi-dev] [jitsi/jitsi-videobridge] packet loss rises to 5%-25% with latest jitsi build (#236)


#1

We have been working with a recent Jitsi Videobridge build where our Chrome clients report roughly a 2% level of packet loss when the system is under load. We have a build from October 2015 where the clients report only about 0.5% loss (at most) under that same load. We are looking for ideas as to where the packet loss might be coming from. The test infrastructure is the same, it is only the Jitsi version that differs.
Additionally, under even heavier load >80 users, the loss level rises to 5%, 25%, etc., while the older build still somehow keeps up.

Any insights?

···

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/236


#2

Unfortunately can't add any helpful information to this, but I did also run some load tests on the bridge about a month ago. I used the following set up:

bridge is in last-n mode, with last-n set to "2"
clients are doing 720p (no simulcast, basicrtcptermination enabled)
bridge was hosted on aws in oregon. m3.xlarge ec2 instance.
clients joined via testrtc, agents were somewhere on the west coast (not sure where testrtc's data center is exactly), no network impairment added
did multiple meetings, each meeting with 2 clients (so all streams were forwarded through the box)
test was doing using vp8 (we've found some issues with h264)

tl;dr details on test cases and observations below, but it looks like at higher loads we see clients failing to get up to full bitrate. Raw CPU doesn't seem to be a problem, but I haven't had a chance to dig deeper to see why clients aren't getting ramped up correctly...part of that was because testrtc doesn't expose all the graphs yet, so I can't look at the data from the client side to get hints. NOTE: because I haven't had a chance to dig deeper, I have no idea if this is a bridge problem or not, but given that it seems fine at lower loads, there's at least some signal that suggests something may be degrading as load goes up?

Some tests and notes:

### 16 callers across 8 conferences.
box showed roughly ~36mbps going across. cpu was at about 130% (same machine as before). bridge stats from /colibri/stats below. cpu number was pulled from top (cpu_usage number from stats doesn't seem to line up).

bridge stats (from the rest api)
{ "used_memory": 5316, "threads": 712, "videochannels": 16, "bit_rate_download": "37353.23876", "graceful_shutdown": false, "videostreams": 32, "rtp_loss": "0.00988", "total_memory": 15770, "bit_rate_upload": "37431.32068", "current_timestamp": "2016-04-19 15:45:16.108", "cpu_usage": "0.15805", "audiochannels": 16, "conferences": 8, "participants": 16 }

### 26 callers across 13 meetings.
box got up to showing the proper ~54mbps, but wavered a bit. cpu was around 180-190%.

{ "used_memory": 5350, "threads": 1145, "videochannels": 26, "bit_rate_download": "51390.8012", "graceful_shutdown": false, "videostreams": 52, "rtp_loss": "0.00022", "total_memory": 15770, "bit_rate_upload": "51495.54446", "current_timestamp": "2016-04-19 15:56:03.483", "cpu_usage": "0.39314", "audiochannels": 26, "conferences": 13, "participants": 26 }

### 36 calls across 18 meetings.
seems like 1 caller failed to join (testrtc issue, i think) so test was really more like 35 callers. video was never able to reach what it theoretically should have, despite cpu usage on the box only going to about 210%. bitrate seemed to hover around 50-60mbps, instead of theoretically 72 (or technically 68-70 because of the caller issue)

(didn't grab the bridge stats here i guess)

···

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/236#issuecomment-216938280


#3

It would be nice if we can trace this down to a specific commit, or at least a specific JVB version.

We've changed our queuing system back in late January/early February, if I remember correctly. This could be related. Do you guys think you could repeat your tests with a build before (jvb@627) and after this date (jvb@636)? I'm summoning @bgrozev to confirm those numbers.

Unfortunately the Debian repo no longer has those builds, so you will have to compile them yourself. Keep in mind that before jvb < 631 the bridge didn't have fixed dependency versions so you might have to play a bit to get a build for 627. 636 should be OK.

···

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/236#issuecomment-217131289


#4

hi guys,
I haven't done load testing, but i have some pending PR to improve some queues in ice4j.
see https://github.com/jitsi/ice4j/pull/62 https://github.com/jitsi/ice4j/pull/63 https://github.com/jitsi/ice4j/pull/64.

···

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/236#issuecomment-217140077