Transcription and translation functionality


this is directed to the developers of transcription and translation. I hope you can find the time to answer me a few questions.

  1. I hosted Jitsi on a server, and I build the projects from the code from master branch for Jitsi-Meet and Jigasi. If I only want to modify the transcription and translation modules, do I need to build also other projects like Jicofo, JVB or lib-jitsi-meet from source?
  2. Was there ever released a stable version of Jigasi with the transcription and translation functionalities working? If so, where is the source code, because I could not find it?
  3. The Speech-to-Text in Google Search for example is quite accurate, but in the implementation on Jitsi, the accuracy is noticeably lower, why is that? I read in Nik Vaessen’s blog that this could be because of the data being sent in raw 48000 kHz rather than on 16000 kHz with flac encoding. But, in the Google Speech-to-Text API documentation it says that the frequency being higher than 16000 kHz does not affect the transcribing.

Thank you!


  1. You will never need to re-build jitsi-videobridge for transcription/translation changes. Depending on what you want to change, in most cases you will not need to re-build lib-jitsi-meet or jicofo either.

  2. The translation was never released, if I remember correctly. The transcription in the current master branch of jigasi works, but we haven’t been actively using it in a long time. We don’t have any other measure of it being “stable”. We will be using it again in the near future, so expect some updates.

  3. Google has different models or APIs which give different results. For example we know that whatever is used in youtube works better than what is available via the Speech API, but there isn’t much we can do. I don’t think that sending 48kHz will negatively impact the results. Enabling the “video” model[0] might help, but note that it may also have different pricing.




@Boris_Grozev thank you very much for your help!