Configurable access to Insertable Streams

Hi – I’m a developer on a screen sharing application using Jitsi as the backing video service. My team has a bit of a peculiar use case we’re interested in working into lib-jitsi-meet, but wanted to get some input from the community first.

Context

The short version: our application allows users to share multiple windows simultaneously, but in order to avoid an exploding number of streams (especially for P2P sessions), we build a composite video frame with each shared window into a single frame, and send metadata about this composition to the receiver so they can decompose and render just the required regions for each shared window. Because this metadata is critical for properly rendering the video frames of shared windows, we directly append these bytes to the video frames, and decode them on the receiver side.

Our primary product is an Electron application, so we’ve accomplished this so far by using a C++ native library that wraps libwebrtc and directly manipulates the video frame buffers. However, we are working on developing a web browser client, and are hoping to make use of Insertable Streams to accomplish it.

The Problem

lib-jitsi-meet does support Insertable Streams, as it is used for implementing E2EE functionality. However, as far as we can tell, access to the Insertable Streams APIs are gated behind enabling E2EE, which we cannot currently support because our own inserted data conflicts with the encryption scheme.

This is mainly an issue because Chrome’s API for Encoded Streams requires the PeerConnection to be properly initialized, which is only done in lib-jitsi-meet if enableInsertableStreams is true – which in turn only is set to true if E2EE is enabled.

Proposal

We’re hoping to add an enableInsertableStreams configuration to the JitsiConference that would allow Insertable Stream support to be enabled while still keeping E2EE disabled. Enabling E2EE without specifying this new field would default it to being true, and specifying enableInsertableStreams as false while E2EE is enabled would result in an error.

Have you tried sending the data using the websocket that is established with the bridge?

The client is already using that channel to ask the bridge to request higher resolution when you change what is on stage(large screen ) and you cannot notice the switching … so it should be sufficient and for your use case. And using that channel will be way easier than adding the metadata into the insertable streams.

Yes – this was originally implemented using a data channel, and then later converted to using web socket signals. With both, we experienced significant latency that impacted user experience.

We have had an Insertable Streams solution implemented for quite some time now on our desktop application, and had good success with it. We have also tested prototypes in-browser by creating a fork of lib-jitsi-meet updated to enable Insertable Streams by default, and were able to see good performance in-browser as well.

In our case, the biggest issue comes when a user resizes their shared window. To minimize bandwidth, we use a box-packing algorithm to compose multiple shared windows together, so while a user is resizing their window it will adjust the position of every other window in the composition, potentially shuffling them to entirely new locations depending on how they fit together. Especially while a user is resizing, this will translate to several hundred frames in sequence with drastically different positions from one another, which leads to very strange visual artifacts for users when the data falls out of sync.

I see. About the insertable streams let’s wait for @saghul to wave in, not my area …

I’ll also tag @Jason_Thomas who leads the CoScreen project. He can speak more to the history of our tests if that would be helpful :slight_smile:

1 Like