Jitsi VideoBridge SIP Integration



I have an existing application that is currently using Jitsi VideoBridge. There is now a need to allow SIP video devices to join a conference.

I’ve looked into Jigasi, and see that it is incompatible with video. Was hoping to find a clear reason why, and the closest I’ve come to finding anything in in the forums is that it would need to transcode VP8 to H264, and the LibJitsi libraries it was built on wont support this.

The recommended option for SIP seems to be Jibri, however the implementation sounds like it wouldn’t scale. If I understand correctly, it would have to spin up a VM per conference with a Chrome instance, connect that Chrome instance to the conference and capture the video with FFMpeg, which can then be streamed out to the SIP client.

So my questions are:

  1. If I were to fork the Jigasi and LibJitsi libraries, where would be the best area of code to start looking at for making changes to allow accessing a video stream to make available for a SIP client?
  2. How well does Jibri scale? Does anyone have any experience with using it with many simultaneous conferences?
  3. As an alternative to modifying the existing code, I’m thinking of building a custom XMPP component. Is there any documentation on the XMPP signaling flow to Jicofo for creating/joining conferences? Would it be simpler to use the REST COLIBRI API directly instead of creating a component?

Any feedback would be appreciated, thanks!


It is not just transcoding, currently there is some initial vp8 implementation that can be used, but there is no error correction. Basically you need to implement a lot of webrtc part for that in java. Yeah mainly you will need to be doing that in libjitsi, probably you will need to touch fmj and https://github.com/jitsi/jitsi-lgpl-dependencies because of ffmpeg. But this is a big big project. I would not want to make brave predictions but would say 6 month to year. And here you will also have the problem with scaling cause one participant encoding/transcoding will need a lot of CPU. Basically for five people conference you will receive five video streams, which you need to merge/transcode into one, or you can choose to only show single video to the sip side, but still even for on participant I think this decoding/encoding twice will take a lot of CPU, and jigasi will be able to handle very few participants.

Yes, we scale it on meet.jit.si, there is an auto-scale group which brings up and down the jibris when there is a demand and deletes unused ones.