Audio Spatialization in Jitsi Meet


I am a master’s student whose thesis will concern applying audio spatialization techniques within telecommunication platforms. I wanted to work with Jitsi Meet, being an open source platform and with a great and responsive community. I will likely be pretty active here in the spring of next year as I work on it, but wanted to ask some preliminary questions to get a sense of what is going on at a lower level.

I’ve read that jitsi-bridge is relaying media channels (both audio and video?) to the participants in a conference. My thoughts hearing this is that the best approach would then be to just have the clients manipulate/spatialize these audio streams locally (with some small latency). I was curious what component of Jitsi is receiving the audio of participants. There is also some issue, I realize, that Jitsi optimizes for an active speaker, but I wasn’t sure if this actually prevents the streams of other participants from being relayed?

If anyone has dealt with audio in this way, I would be grateful to hear any of your dev experiences.

Jackson bmo

Nope, if not muted it is relayed.

This is the webrtc implementation in the browser, managed by lib-jitsi-meet.

You can take a look in lib-jitsi-meet, there is some implementation of mixing audio … and detecting missing audio or noise, so there is stuff already doing audio processing in the library itself …

Thanks for your response!

Ahh okay, so it doesn’t have anything to do with jitsi-bridge?

Not at all, the bridge is just routing packets. It cannot decode or encode any codec so cannot see any content …

1 Like

So I’ve been browsing through the source code of lib-jitsi-meet and part of my assumption was that I could find the place where a participant’s audio stream is received by a user, spatialize it (with Google’s Omnitone) and have that media stream carry on with the rest of its journey.

Unfortunately, I am struggling to find where it is exactly that the media streams from the participants are received, client-side, by the user. It could be that I’m over simplifying the process.

It is received by the browser, you create a track and there is an event remote track received or something like that … Search for mixer and noise detection …