Is a many-to-one implementation possible?

Hello together,

first of all, please pardon me being totally ignorant about development or even installation topics. This question is possibly stupid and badly researched. That being said:

From my understanding, the Jitsi ecosystem provides solutions for one-to-one (video telephony), one-to-many (streaming broadcasting), and many-to-many (video conferencing) connections.

Right now I am evaluating possible solutions for a many-to-one usecase. The central idea is to have several remote participants connect (with an easy to use client, like Jitsi Meet) to one central endpoint without seeing each other in the first place.
This central endpoint then should be able to output the video and audio signals as they come in (respectively up-/downscaled to a common resolution/framerate) discreetly; in the best case to a virtual graphics card as an NDI stream.

We would then want to use those signals (which show one and only one participant) with a regular old school broadcast system, i.e. a sophisticated vision and audio mixer setup (external hardware).

That mixed signal (common picture, individual audio mix minus) should then be fed back to each remote participant.

So in a nutshell, we want to do a video conference, but not use the built-in video and audio routing of an integrated conferencing solution. Instead, we want to override that with human-operated mixers.

Is it feasible to modify the existing Jitsi project to add this functionality? Or does it even already exist?

Please contact me if you are interested in developing such functionality.

Best, David

2 Likes

Hi @dbergmann ,

Have you tried accessing from Desktop Browser with more than 1 Chrome browser tab?
I have tested on my current Project at : https://meet.247.vc/ and it is possible to do Many-to-One as well as One-to-Many VC sessions.

Sincerely,
arivonto
https://www.linkedin.com/in/arivonto/

1 Like

I’m also looking for a solution like Skype’s NDI output for each video stream of the videoconference. I just hate that Skype logo all over the place, especially when a participant deactivate its camera : the feed is automatically resized and the Skype logo appears very big in the center. This screws up my setup in OBS and doesn’t look very professionnal. I wish there was an open source solution to this. The only other solution I’ve found is vMix, but it costs way too much and is only working on Windows.

arivonto,

first of all, sorry for getting back to you so lately. And second, thank you very much for your proposal.

Unfortunately, that just shifts my fundamental problem to a different layer. Now, instead of having to convince a (theoretical, however implemented) custom Jitsi instance to feed out individual video and audio of every participant to the outside world (and the gateway to that outside world could be conventional hardware, a virtual graphics card like NDI, or a computer internal pass through to a different application on localhost), I have to make Chrome do that routing for me.

As far as I know, no browser is capable of sending the content of each tab to a different graphics card, and even more importantly, to either a different audio interface or a different channel of a multichannel (ASIO, core audio…) audio interface.

I can surely hijack the audio of a single application and route it wherever, but to my knowledge that is only possible for the whole application instance.
I.e., with your proposal I would have the content off all one-to-one chat tabs mixed together with all my 100 Youtube tabs :wink:

To elaborate further on the scenario we want to achieve: we want to use a video conferencing solution as a basis for the production of a broadcast (as in old school television) quality programme. Think in terms of a talk show, only the guests can not attend the studio (for current reasons). For the same reasons (+ budget) it is not feasible to bring a satellite uplink to every one of them.
All our thoughts come from a background working in the broadcast industry.

The internal video switching and especially audio control functions (which boil down to muting and unmuting of participants), while certainly fully functional for ‘utility’ conferences do unfortunately

a) not meet our quality standards (sorry for sounding arrogant, but an audio automixer/-switcher without any (at least) levelling capacities just sounds dreadful; irrespective of the microphone quality used by the participants)

b) do not provide a user interface for a quick workflow that gives a vision mix operator and an audio engineer the opportunity to react to the content of the conversation and the calls from a director

c) have hardly any good options for configuring custom split view/PIP looks (I might be wrong about that)

d) offer no utility and trouble shooting off-air functionality (picture preview, audio cue, being able to talk to a participant for editorial briefing, trouble shooting issues on her side without the other participants and the intended audience noticing.

All that functionality exist on all thinkable quality and integration levels outside of the conference application; ranging from virtual mixers/switchers running on the (same) computer to studio standard conventional hardware devices. The only prerequisite for using those capabilities is to get the signals discreetly outside of the conference application, process/mix them, and feed more or less customized signals back to the remote participants.

So far, what we are probably going to do is just pile a bunch of computers on top of each other, have each of them run one instance of (pick one: Skype, Zoom, Jitsi…), connect them one-to-one to our participants, and use their respective audio and video outputs to get the discreet signals.
That approach is certainly doable for smaller panels. However, currently we are planning for a production with over 30 active participants; and we were wondering if it is not possible to have a scaleable solution that involves way less hardware.

Especially thinking a little further and thinking about virtualizing the mixer/switcher components in ‘the cloud’, which is a separate ongoing research project, such an endpoint for chat participants might live on the same machine, making it possible to move the whole control room out of the physical realm. A scenario that might become very interesting for the broadcast industry when current movement restrictions get worse.

So this might become

  • something no one is really interested in developing (looks like this at the moment)
  • something that, if technically and financially feasible, might become just a custom solution for our production(s)
  • or something that turns into a commercial product with a certain niche market (that is used to paying quite good money for solutions)
  • which, to keep it open source interesting, might also have a (feature reduced?) spinoff that helps someone like dled78 below with his usecase

But I still have no idea about that feasibility.

Once again, thank you,

David

1 Like

I may have a solution, following @arivonto’s idea of having multiple Chrome tabs.
Disclaimer: I do not know Jitsi, I was just investigating a video-conferencing solution that could output multiple streams independently, to mix them with OBS.

For @dbergmann, what you can try:

  • Install OBS, and the obs-ndi plugin (which includes the NDI Runtime tools and libraries) ,
  • Run your Jitsi conference,
  • Make a scene in OBS for each of your attendant. In each scene, you can add a browser source and configure it to only show one stream (note that you can interact with the browser by right-clicking the source and then select “Interact”)
  • For each browser source, go to the filters panel (right-click the source, then “Filters”), and under “Effect filter” you have “Dedicated NDI output”. It should extract the content of the source to an NDI output, which you can them get and restream where you want! I just don’t know if the audio is included in this NDI output. If it’s not, you can create a different stream by using an “Audio filter” and using the “Dedicated NDI output (audio only)”.

That should give you the ability to have an NDI source for each stream of your conference. The only real downside of this solution (if it works) is the time required to manually get each stream and restream it. This is not an automated all-in-one solution. But OBS recently got the ability to create Python or Lua scripts, so you may call an API on Jitsi and autocreate scenes and sources for each one + add filters with the rights settings (it’s perfectly doable with the APIs OBS provides you).
I did not test this solution but in theory, it works. It may be a bit heavy, so you may do some tests before using it in production. Still, OBS is an open source software really well optimized and stable enough for production purposes.

But for long-term solutions, it would be more than awesome to have some sort of video output available from Jitsi-Meet-Electron. NDI is the best candidate for that, as Skype does (but they add their branding, and the quality is pretty bad unfortunately).

Hope I helped!
Arno

2 Likes

Hello Arno,

indeed you did help. I spent a short time this afternoon to get OBS and set it up according to your recipe; and your suggestion works.
For now (for me…) only in principle because the wimpy Macbook Air I was using for the test started melting its processor once I used two Youtube browser sources (to have easily identifiable picture and sound), but I did have two separate NDI streams as well as separate audio.

It is actually not necessary to have the browser sources in separate scenes. The NDI direct out also seems to work when both are in the same scene, independently of the OBS programme out.

So, I will definitely have some testing to do. First of all, get to know OBS better and find out if the processor load is caused by it desperately trying to encode HD video all the time on the CPU (and if I can disable that). And then check if the OBS built-in browser is also able to handle the Jitsi content.

And finally see if this scales to a reasonable amount of parallel sources on a more performant computer, or if this is ‘only’ reliably good for two to three connections. In that case, I would probably go for the pile of computers solution…

But, as I said, your idea of using OBS as a multi source media converter works.
Maybe @dled78, as he is using OBS in his setup anyway, can draw some use out of your suggestion as well.

And thanks again @arivonto, seems like your initial idea was really something to build on.

Thanks a lot everyone, I will try a hopefully more performant setup tomorrow.

Best, David

1 Like

Hello @dbergmann!
I think the browser source in OBS for MacOS doesn’t have hardware video decoding yet (https://github.com/obsproject/obs-browser/issues/149 they’re waiting for CEF patches for this). On Linux it’s exactly the same problem, so you may use OBS on Windows, to get hardware video decoding (+ I don’t know how many streams what GPU can decode at once, it may be artificially limited on some consumer NVIDIA cards, and probably pretty low on AMD cards). And as you said, a Macbook Air is pretty low-powered for video stuff, I’m actually impressed it could even run two YouTube sources at the same time! On the output side, NDI is actually pretty lightweight, and it’s a really CPU-efficient codec, meant for LAN transmissions. A bit like ZIP, but for videos. So that will probably not be a bottleneck.
I only separated everything in scenes for management, so you can easily have high resolution streams (for the best results, you need to set your browser size to the size of the stream you receive). So your streams would probably overlap in the view, which make them harder to monitor and manage :wink:
I think you may be able to manage your 30 streams using two powerful computers. I used mine, with a Ryzen 3700X and an AMD 5700 XT and managed to output 10 streams at 60 FPS. If you target 30 FPS or even 25 you may double it.

Good luck!
Arno