Strategy to implement other transcribing services than Google API (VOSK and Whispering)


I am working on some fixes for implementing VOSK with Jigasi and Jitsi. Since the new stable version of Jitsi (2.0.7830) there has been an issue when selecting the transcription language.

A potential fix involves editing the frontend, so that it shows only “Enable subtitles” when only transcription (not translation) is enabled and shows the whole language menu when translation is enabled. This causes an issue where someone won’t be able to select a transcription language for VOSK (it does not support multiple languages).

This causes another issue if we want to implement multiple transcription languages for VOSK. If we implement multiple instances of VOSK with different languages, the user will no longer be able to choose the transcription language.

So my question is: how can we decide which solution works best for the user? My end goal is to have a self-hosted transcription solution (using VOSK or Whispering), which supports multiple languages.

Thanks you very much in advance for any help/directions

So if you don enable translation languages in config.js what happens? (Sorry I have not tested any of these recently)
I suppose it should not give you that option and leave you with just start and stop captions…

Hello. You could add a config option for enabling translation and when the transcription and translation is enabled everything works how it is, but when the translation is disabled you can use the language selector dialog to set the desired transcription language. Then you have the flexibility to configure the languages you want to show.