VideoBridge's dominant speaker selection algo to top N influential audio participants

Currently, the video bridge routes all audio packets to all participants from all participants (?) and routes video packets from the last N (depending on audio activity) participants along with the dominant speaker. right?

let’s say I have a simple SFU which just relays audio packets from all participants and due to my bandwidth restriction I cant route audio packets from all participants. And so I wanna route audio packets from Top N influential audio participants so that the audio experience will be tans to smooth. this top N can be calculated by RMS value or presence of active speech sub-bands and keeping an list of those participants which will be updated dynamically. other audio packet will make room in top N if it’s current voice score surpass the lowest value in top N. is this possible?
I just don’t wanna send 50 participants audio to everyone even if 50 participants turn their mic on, I just want top N (example : 5-10) influential participant’s audio packet to everyone so that audio experience wont be messy and will get the best out of it. Please share your ideas on this if you think this can be a good solution to my problem or there is already a solution for this. you can think of it like top n dominant speaker.
** as far as I know, google meet also doesn’t forward every participant’s audio, they forwards selective participants audio packets even if everyone turn their mic on.

JVB has the feature already, to route only the loudest participants’ audio:

does “loudest” here means just by rms or complex algo like dominant speaker?
If they calculate loudest by RMS then shouldn’t there be at least a delay to wait for a batch and then calculate top n loudest and relay them and discard others?

It’s simply based on energy content of audio packets, but with exponential smoothing: jitsi-videobridge/reference.conf at master · jitsi/jitsi-videobridge · GitHub

You can also optionally force the dominant speaker to be included in the set of routed audio endpoints, even if they are not in the loudest set: jitsi-videobridge/reference.conf at master · jitsi/jitsi-videobridge · GitHub

We don’t use it in production here yet, but we did some testing with it, and it seems to work well.

1 Like

So, lets say participants are A,B,C,D… and energy levels are like 2,3…
lets say I configure video bridge to route top 2 loudest and in here are the packets that are coming to video bridge : A-10, C-8, B-5, C-7, D-6, D-2, A-9, B-9, C8 …
So here, After just 2 incoming packets A & C makes room in top 2 loudest and C,D wont be routed right? and it goes on until B-9 comes to play, then B-9 makes room in top n loudest and C-8 gets out of loudest n?
Can you point out the code where they did the exponential smoothing? I guess by exponential smoothing you mean not hard change, but change over time, right?
and Thanks for the informative reply :heart:

Whenever a new audio level is received by the bridge it eventually finds it way to DominantSpeakerIdentification::levelChanged() which handles both determining the dominant speaker and the ranking of speakers by energy level.

When route-loudest is enabled, the bridge just forwards the top N speakers by energy level (with N determined by the config linked earlier), plus the dominant speaker (if configured to always be forwarded).

Yeah… I looked at that code few days ago and it seems really complex. I also read the paper that actually gave this idea of 3 different time interval based dominant speaker calculation which actually did the calculation in freq domain. Anyway, I got the basic theme, thanks. I have a plan to do it little differently and simply as I dont think I can use this dominant speaker module.