Because we have some unique requirements (we want to captures individual audio streams as well as RTCP SenderReports, so that we can do additional per-participant processing while also capturing the RTP/NTP time sync information on each stream), we are trying to leverage the (now deprecated) RecorderRtpImpl. We finally got it to work end-to-end, however we are running into some really significant audio issues.
The audio streams are getting persisted, however, the audio files are extremely choppy — almost as if a few milliseconds of audio was inserted into between every 20ms frame. I am wondering whether this has something to do with either:
The SilenceEffect and/or ActiveSpeakerDetector codecs/effects
The Muxer being used to output the stream
When we disabled the SilenceEffect, the problem appeared to get a little better, but the quality of the saved audio streams is still quite bad (almost incomprehensible). I tried writing out the linear audio samples within the doProcess method of the SilenceEffect — just to see whether the problem exists further upstream in the JMF graph. However, the audio we got was WAY worse, producing a wav file that was 10X larger than the audio file normally generated via the DataSink. This seems really strange to me as when we log the Buffers coming in to the doProcess method of the SilenceEffect, I don’t see any dropped frames, and the rtpTimestamps of each buffer appear to be the expected 20ms/960 samples/1920 bytes. So what would explain the extremely stretched out debug output, generated via “logging” the bytes from each Buffer/frame coming in to the SilenceEffect?
Another theory we have is that the problem is caused by the Muxer created via JMF in the custom DataSink produced in RecorderRtpImpl. Because we wanted to adjust the Buffer timestamps in our datasink, we needed a PushBufferDataSource and therefore we couldn’t use the WavMuxer. So we set the ContentDescriptor to “raw” in order to get the RawBufferMux. I don’t see how this should cause any issues (especially since there appear to be problems upstream), but I wonder whether it is somehow blocking or that maybe the CircularBuffer is overwriting frames before they are output to a file.
Another theory is that there are synchronization issues either causing inadvertent blocking, or causing threads to somehow affect each other’s data. This doesn’t seem likely, however, since we really haven’t changed the general flow of the RecorderRtpImpl (it’s generally the exact same code, with just a few small changes related to the DataSink and post-processing timestamps).
My last theory is that the RecorderRtpImpl is no longer compatible with the existing Jitsi codebase, and that these significant audio issues are somehow related to the older JMF code. We’ve mentioned using the RecorderRtpImpl on the Jitsi community call, and the feedback we got was that it should generally work in an “audio-only” context. We would use Jibri or Jigasi instead, but we needed access to the RtpTranslator in order to intercept RTCP SenderReport packets so that we could use this data for capturing timing data so that we could support some custom synchronization and audio post-processing requirements within our application.
If there somehow is a significant issue with the now-deprecated RecorderRtpImpl, I wonder whether there might be a work-around? For instance, would it be possible to still intercept RTCP senderReports, but use a different approach for capturing streams? Could we possibly use the AudioMixerMediaDevice or AudioSilenceMediaDevice to capture individual streams (via the ReceiveStreamBufferListener) but still get access to RTCP SR packets to capture timing information?
We have been spending weeks trying to get this to work reliably, so any help or feedback you can provide would be very appreciated!
Thanks in advance for all your help!