[jitsi-dev] Dominant Speaker Identification


#1

Hi,

I'm trying to understand the dominant speaker identification code
https://github.com/jitsi/libjitsi/blob/master/src/org/jitsi/impl/neomedia/DominantSpeakerIdentification.java

The referenced article is using sub-bands in the frequency domain which
help separate voice from noise and ignore volume differences between
speakers.
How jitsi manage to do the same with volume levels?

Does it normalize the audio levels between loud and quiet speakers?
I have a speaker with a relative low volume (but still clearly heard) that
is never selected.
Above what audio level I should expect to get reasonable values in
immediate, medium and long?

Each speaker's silence is replaced with uniform level of absolute silence
byte minLevel = (byte) (this.minLevel + N1_SUBUNIT_LENGTH);
https://github.com/jitsi/libjitsi/blob/master/src/org/jitsi/impl/neomedia/DominantSpeakerIdentification.java#L1063
why do you add N1_SUBUNIT_LENGTH to the speaker's minLevel?
I don't understand how this is setting global silence.
N1_SUBUNIT_LENGTH is used to separate the volume into bands.

The SPEAKER_IDLE_TIMEOUT is 1 hour (60 * 60 * 1000).
Isn't it too much?
https://github.com/jitsi/libjitsi/blob/master/src/org/jitsi/impl/neomedia/DominantSpeakerIdentification.java#L181

Do you have audio test files to use when trying to improve the algorithm?

Thanks