[jitsi-dev] [jitsi/libjitsi] Performance regression in RTPConnectorOutputStream (#178)


#1

Hello

We have be tracking down an issue with the jitsi videobridge that causes increase CPU utilization as well as increased packet loss, especially in audio streaming. We are trying to fix a problem related to audio drop-outs/PLC in our production environment, but we haven't yet determined if this regression is implicated. We may stage a production release sometime in the next two weeks with USE_SEND_THREAD disabled to test it.

We have isolated the issue to this commit from Feb 3:
https://github.com/jitsi/libjitsi/commit/73f20dc7f35b19914d8c7181d50826a6e404be08

The issue appears to stem from the use of multiple threads to simulate nonblocking network IO. With use USE_SEND_THREAD enabled (the default), we see higher CPU/packet loss than with the setting disabled.

#### Test configuration 1:
* Jitsi Hammer with 100 fake users in a single room, using big buck bunny video and default silence stream for audio, test runs for 300 seconds.
* 4core, 8GB VM-based videobridge server (redhat 7)
* The issue reproduces with videobridge HEAD builds as well as builds from Feb 3 onward.

USE_SEND_THREAD enabled
* ~70% average CPU usage, spikes to 80%+
* Both hammer and the videobridge report some dropped packets in the log (message: "WARNING: Dropped 1 packets hashCode=1490171397)"

USE_SEND_THREAD disabled
* ~50% average CPU usage, spikes to 70%+
* No dropped packets reported

#### Test configuration 2:
* Jitsi Hammer: 10 instances with 10 users each in 10 rooms (100 users total), using badger video/audio
* 1 Chrome instance in each room to measure packet loss via chrome://webrtc-internals
* 4core, 8GB VM-based videobridge server (centos 7)

USE_SEND_THREAD enabled
* Chrome reports: Audio median 12.7% packet loss, Video median 3.9% packet loss
* Cpu saturated (95%+)

USE_SEND_THREAD disabled
* Chrome reports: Audio median 6.7% packet loss, Video median 2.6% packet loss
* Reduced CPU usage

##### sip configuration

net.java.sip.communicator.SC_HOME_DIR_LOCATION=/home/ec2-user
net.java.sip.communicator.SC_HOME_DIR_NAME=.sip-communicator
net.java.sip.communicator.packetlogging.PACKET_LOGGING_ENABLED=false
org.jitsi.videobridge.ENABLE_STATISTICS=true
org.jitsi.videobridge.STATISTICS_TRANSPORT=colibri

# this is set to true or false as required for test
org.jitsi.impl.neomedia.RTPConnectorOutputStream.USE_SEND_THREAD = true 
···

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/issues/178


#2

Hello,

The issue appears to stem from the use of multiple threads to simulate nonblocking network IO. With use USE_SEND_THREAD enabled (the default), we see higher CPU/packet loss than with the setting disabled.

The higher CPU utilization is somewhat expected, due to the introduction of new threads. The reason for this change in the first place is that if one receiver is connected via TCP, we would previously block the RTPTranslator thread, thus affecting the whole conference.

We also measured slightly higher CPU usage with USE_SEND_THREAD. We feared that this might increase the jitter, but if anything it was decreased.

We are trying to fix a problem related to audio drop-outs/PLC in our production environment, but we haven't yet determined if this regression is implicated.

I would be surprised if this has an effect on audio, since the bitrates as so much lower than video.

USE_SEND_THREAD enabled

    ~70% average CPU usage, spikes to 80%+
    Both hammer and the videobridge report some dropped packets in the log (message: "WARNING: Dropped 1 packets hashCode=1490171397)"

This is expected to happen at some point as the load increases. It may happen earlier with USE_SEND_THREAD.

USE_SEND_THREAD enabled

    Chrome reports: Audio median 12.7% packet loss, Video median 3.9% packet loss
    Cpu saturated (95%+)

Is that 12.7% packet loss due to the network, or packets discarded on the bridge?

The difference between audio and video is strange, could it be due to video packets being restored with FEC?

Is chrome's cpu saturated or the bridge's?

Since the only use for USE_SEND_THREAD is for receivers connected via TCP, it may be worth enabling it dynamically only in this case. A contribution would be welcome. If you don't care about TCP, you can safely use USE_SEND_THREAD=false.

Regards,
Boris

···

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/issues/178#issuecomment-232159483