Hardware Compositing at the server level

I know the VideoBridge is specifically designed to only do adaptive streaming of all the participants. I was curious if anyone has looked into adding the functionality to support hardware decoding, compositing, and re-encoding on the fly. It seems like this would be very useful especially for cascading bridge approach, where you could consolidate all regional streams into a single encoded stream.

Having hardware compositing on the server would also allow hardware accelerated video recording, and some other niceties which would let things scale better.

This is really a fact finding mission for myself as I want to see where pain points are in OSS video conferencing and if hardware solutions could help.

(not a developer, but active in admin use, working on dev)

The entire framework is certainly not designed to do transcoding, though the project does date to an era before hardware transcoding engines were a thing. Are you aware of cloud services that expose hardware transcoding engines?

There’s no fundamental reason the project couldn’t do it, but it’s certainly a vast amount of development.

Amazon has their elastictranscoder, but I have not looked at it in depth yet. I am actually gathering data for a hardware platform we could actually produce. I know that we would lose true end to end encryption on the streams, but the micro-servers could be deployed on demand and provide a secure hub for over the wire encryption. The only place the video would be in the clear would be on device memory.

It is the age old question of security vs convenience. Since Jitsi sends audio and video as independent streams it should be possible to keep audio encrypted from end to end.

I am talking possibly 1080p60 streams for everyone upstream and downstream a 1080p60 stream with the video scaling and compositing done in the cloud. that could handle a 100 person conference. Of course this would not be the common case. The footprint would be a 1RU using 350Watts.

As I understand the elastic transcoder, it’s intended for bulk file processing - not realtime transcoding work. That wouldn’t work for videoconferencing loads.

I also don’t believe Jitsi offers (or claims) end to end encryption - just link layer encryption. The server has to be able to meddle in the streams to provide the variable quality feeds out of the simulcast upload. However, as you’re intended to control your own server, the risk is far less than using some random service provider’s system. But, yes, you could probably keep audio encrypted. Then one simply has the question, “Who generates the encryption key?” :wink:

I doubt you can handle a 100 person, 1080p60 conference on a 350W budget without specialized hardware transcoders (FPGAs, most likely), but it’s worth experimenting with, for sure! You’d have to benchmark quicksync and GPU transcoding and such - I don’t have a good feel for their performance.

Jitsi team is also considering other options:

it’s another age-old dillemma, power at the center or at the edges.