Architectural question regarding processing video frames

I’m working on a project to put filters/avatars on individual meeting attendee face. I’m in the process of understanding the architecture. please can you point me to modules, i need to look into to tap the data/video for processing from individual attendee or any community links would be useful.

A birds eye view:
Camera -> “A” Camera driver -> “B” on the sending client Web-browser using webrtc and javascript -> “C” on the server inside jitsi VideoBridge -> “D” at the remote viewer

A: Some people add filters to their jitsi meet video today by using a camera driver that can process the image.

B: at the sending client web-browser jitsi’s javascript react webinterface can use some webrtc extensions to access the webrtc camera videoframes, this is today used to encode jitsi’s end to end enctyption.

C: you can tap into the video stream on the Jitsi Video Bridge, JVB, as the server have access to all attendants video streams.

D: at the receiving clients web-browser jitsi’s javascript react webinterface can use some webrtc extensions to access the webrtc camera videoframes, this is used to decode jitsi’s end to end video encryption.

1 Like

So if you want to extend jitsi meet web interface at point “B” to add filters then focus on what you can do with webrtc:

1 Like

Great information. Any pointers on where to start with the third option “C”, i.e. tapping into Video Streams on the Jitsi Video Bridge?


These are client side effects and we already use them for blur effect.
The videobridge does not have access to decoded data, the bridge cannot decode or encode videostreams …

Thank You, so if I understand correctly, you’re telling me that I can not do any kind of processing on a frame by frame basis on the incoming frames from client browsers/apps on the server side?

Yes, there is no such option.

Thank You again for your response. Is this achievable if I extend the JVB source and deploy my own version? If yes, can you please help me understand where to start and which modules I should be looking at to extend/enhance for this requirement? I have my own javascript webrtc client, I just need a server peer which can connect with multiple clients and do image processing frame by frame for all client’s incoming videos, before the composed video is streamed out to other peers. And this HAS to be on server side, client side will just not work for us.

Thank You again for your valuable advise. Its a big help.


I’m not by a far cry the expert on this, but it seems to me like what you’re asking for is a total departure from the governing architecture of Jitsi. Jitsi runs on the SFU model, where the server simply forwards streams to the clients (putting it in the most simplistic form). It does not do media processing on its own. This is in contrast to the MCU model where the server is responsible for mixing, muxing and demuxing before sending the processed streams to the clients. So it seems to me like you’re barking up the wrong tree; you’re asking for a completely different build entirely.


I understand. Is there any module that I can use instead of JVB for the purpose of building a Java based webrtc server peer? It probably will be better than if I start from scratch. I see a lot of components as part of the Jitsi project, like libjitsi, which one can give me a baseline to start building a webrtc server peer which can work as MCU instead of SFU.