Transcription and real-time captions


I would like to know whether Jitsi has an integrated

  • real-time closed captioning service?
  • transcription service?
  • automated translation for those?

I could only find some limited info about transcription (and that it is a paid feature).


In order to have transcriptions on your hosted deployment you need to configure jigasi. More info:

There is also an option to enable translations for the transcriptions.

Thank you for the quick answer.

I understand from the resource you linked that I need to use an external provider and have some programming knowledge. Is that so? In that case, is it only Google Cloud Speech and Vosk I can use, or there are also others?
Or is there also a simple “press some buttons” solution available?

Some more questions…

  • If I need to create the integration, id there any resource (instructions) I can follow to create the integration?
  • If I want each participant to be able to see the captions (and to turn it on or off), is it enough if I set it up, or each participant needs to configure it with their access point?
  • Can I save the captions with timestamps, so that I can easily publish my video with closed captions?

You can subscribe for and integrate it in your app.

What do you mean by that? The document provided is how to integrate jigasi with your jitsi-meet deployment.

The moderators in the meeting will be able to turn on the captions. Once that is on the guests in the meeting can turn the captions on or off locally.

There is an option for jigasi to save captions, but without timestamps. Timestamps are tricky in this situation as there is no way to sync jibri for the recording and jigasi for the captions in order to produce a file with timestamps.
You better implement enabling captions for follow-me feature, which the moderator can use and that will enable captions to everyone including jibri and you will have the captions there. Any PRs are welcome :slight_smile:

Thanks again!

I thought there is more to it than what was in the jigasi integration doc you sent before.

I think I can start with this info and get back to you if I have difficulties or new questions…

Some more Qs…

The service uses its own speech recognition to create captions, or it uses other services (e.g. Google Cloud Speech)? If it has its own service, 1. does it support Swedish; 2. does it offer any special features, like profanity filtering?

Can I, or a meeting participant set how the captions should appear? (E.g. change font size / type / colour, number of lines displayed, line length, background behind captions…)

Do I understand it correctly, that captioning/transcription is a paid feature (0,06$/min) only if I use the service but I don’t have to pay if I do the Jagasi integration?

It uses Google and jigasi configured as described in the document. Translation to Swedish is supported. No filtering.

Nope, sorry.

Yep. offers you the full video experience. In order to use jigasi you need to have a deployment of jitsi-meet: Self-Hosting Guide - Debian/Ubuntu server | Jitsi Meet. While JaaS offers you the deployment and all other components integrated like recording, dialing in and out, and transcriptions. If you are hosting it yourself you need all the infrastructure and shenanigans around it and if you need high availability you need even more complexity and if you need multiple regions …

Everything @damencho said, PLUS even if you host your own server(s), you’ll still have to pay Google for the speech-to-text conversion.