Voice Changing

Hello jitsi developers and community!
Thank you for your amazing work on the jitsi project.
We are working with a professor at MIT to run some custom real-time DSP on the server which requires voice changing/masking. So we are trying to integrate the voice changing feature in jitsi.
We’ve set up our own audio modulation server which receives original audio chunks and sends back modified chunks via websockets.
Below is patched code in lib-jitsi-meet that sends original data to the modulation server and gets the modulated data back through websockets

for (let i = 0; i < tracks.length; i++) {
    const track = tracks[i];
    const mStream = track.getOriginalStream();

    if (track.getType() === MediaType.AUDIO) {

        if (window.location.search.indexOf('nomodulation') < 0) {
            // Create new audio context for output
            const audioCtx = new AudioContext({ sampleRate: 44100 });

            const queue = [];
            const processor = audioCtx.createScriptProcessor(4096, 1, 1);
            const silence = new Float32Array(4096);

            // Custom stream source node
            const source = audioCtx.createMediaStreamSource(mStream);
            const dest = audioCtx.createMediaStreamDestination();

            source.connect(processor);
            processor.connect(dest);

            processor.onaudioprocess = function(audio) {
                const inputData = audio.inputBuffer.getChannelData(0);
                const outputData = audio.outputBuffer.getChannelData(0);
                const queueLength = queue.length;

                if (queueLength) {
                    outputData.set(queue.shift());
                } else {
                    outputData.set(silence);
                }
                if (queueLength < 10) {
                    socket.emit('track', Object.values(inputData || {}));
                }
            };
            socket.on('modulate-stream', data => {
                const floatArray = new Float32Array(data);

                if (!floatArray.length) {
                    return;
                }
                queue.push(floatArray);
            });

            // Replace original stream with modified stream
            track.stream = dest.stream;
        }
    }
}

Now we’re able to host jitsi calls with the voice changing feature on our custom jitsi server, but we have two major issues - latency (5-20 seconds of delay in some cases) and static (makes the voice barely understandable). Any suggestions on how we can improve the latency and sound quality would be greatly appreciated.

1 Like