I plan to integrate the mozilla deepspeech project into jigasi to enable automatic speech-to-text. I am aware of the work @Nik_V has done with enabling automatic speech-to-text with Google Speech API.
My plan is to use the Java bindings provided by Mozilla DeepSpeech (https://deepspeech.readthedocs.io/en/v0.7.0/Java-API.html) and implement the transcription service interface already existing in jigasi. The bindings over a streaming interface and matches what jigasi needs (broadly).
Any comments, feedback or ideas to help with this?
I also have a more specific question - the Java bindings of Mozilla DeepSpeech needs a 16-bit, mono raw audio sampled at 16kHz (assume raw means linear PCM encoding). I guess that the audio format coming here https://github.com/jitsi/jigasi/blob/master/src/main/java/org/jitsi/jigasi/transcription/TranscriptionRequest.java is in the Opus encoding. Is there an established way in the jitsi codebase to get the raw output from the Opus audio format? Also might need to resample to 16 kHz.