As you may know, Jitsi doesn't support echo cancellation for PulseAudio, which I believe Jitsi defaults to on Linux. The PortAudio backend does support it, though it doesn't perform very well.
It turns out PulseAudio includes a WebRTC-based echo canceller since version 2.0, which was released over two years ago. Enabling it programmatically would be a one-liner. Unfortunately, two years later Ubuntu still has it disabled at compile time .
Therefore, I was thinking about implementing a WebRTC-based echo canceller right inside Jitsi, like it's currently done for the Mac CoreAudio backend.
It seems more or less straightforward to do, except when you have more than one playback stream, which I presume would be the case in a conference mode. Looking at how the Mac CoreAudio backend handles this, it seems plain wrong. The way I understood the source code, it feeds audio samples from different playback streams into the same echo canceller, as if those samples came from the same playback stream, sequentially. That doesn't sound right to me and that's what I want to discuss. What's the proper way of handling this? Mix your playback streams and feed that into the echo canceller? Sounds too hard to implement. In fact, that would be like reimplementing a good part of PulseAudio, given that you will have to deal with timings and possibly different sampling rates. Have a separate echo canceller for each combination of capture and playback stream and then let each echo canceller associated with the given capture stream modify the data coming from a microphone? Sounds like an overkill, and I've no idea if it's going to work at all. Any other options?