High capacity recording and closed captioning (Jibri and Jigasi)? Best practice?

I plenty of information on creating high capacity niginx/Jitsi-meet + jicofo +jvb, and some of the challenges with prosody’s single-threading.
I am having trouble finding any information on design/plan/implement high capacity recording and closed captions approaches for Jibri and Jigasi.
There is plenty of information about multiple JVBs to handle more demand/capacity. Is there information on similar approach for Jibri and Jigasi?
For example, if supporting 20,000 simultaneous participants, average 20 participants per room, with 1-2 video/audio per room. That is pretty clear how to plan for the 1,000 rooms.

But what if 500 rooms want to record the audio/video sessions, and another 500 want to have closed captioning running. How does one prepare for that?

For example we are looking at using Vosk for CC (no info on how many vosk servers we’ll need yet, have to test that unless someone has information I can see about it?). We are to expect at least 2 million minutes per month minimum of closed captioning. This would be over $500,000 USD per month minimum if using Google. So trying to setup on own Vosk CC instead.
Want to try to prepare for this high volume of CC demand though, and having trouble finding many conversations at this scale.

One resource suggested that for each room recording their session (Jibri), we should have a dedicated Jibri instance, especially because of the issues with the alsa audio loopback challenges. Can this be a very low end like t3.small, or does it need to be a little strong with something like 2cpu 4gb ram (t4g.large)? Is this accurate? Overkill? Underpowered? Unknown? How does that get resourced on demand?

For closed captioning to Vosk, I so far can’t find any information on how many simultaneous real-time closed captions it could handle from Jitsi at any level of hardware. Any information you are aware of? Is this another situation where will need to spin up a bunch of small instances for every room that enables CC? Is there a ratio it can handle, for example 4 rooms per server instance?

Thank you for thinking about this and any suggestions or resources you could point me to for scaling these components.

It’s possible and easy to create multiple audio loopback devices on a single server. This is not the challenge but each jibri needs its own desktop/browser. So you can run multiple jibri instances on a single server using containers.

But when there is only one jibri instance on each server, it’s easy to implement an auto scale-down scenario

I think jigasi can handle more than 4 simultaneous room transcriptions per instance. How would vosk work and scale, no idea, we had never used it, you can check on github the people that contributed there and ask them.

Jibri instances need to have enough power 4cores and 4GB of RAM as a minimum, if you are doing 1080p (the default at the moment) you need even more.

Thank you for the additional clarification, that is very helpful.
Is there any approximate formula to estimate the number of recording sessions that can be handled per instance? For example with a 4cpu 16gbinstance? What would be a reasonable ratio of concurrent recordings before it would be recommend to scale up another instance?

RE Vosk, that is about what I figured. I will spin up some conversations with them shortly. Thanks.

RE: Jigasi, (ignoring any unknown limitations from the Vosk side) do you (or anyone else) have a rough idea about an approximate ratio of how many simultaneous room transcriptions Jigasi could be expected to handle, since you think it is more than 4? Any basic formula I could use as a starting point?
Thanks kindly!

Not really … but I will guess in the range of 50-100 concurrent sessions …

1 Like

Each instance can only record one session and each instance needs ~4 cores and ~4 GB RAM as @damencho mentioned.

Ouch! No chance, with performance tuning, to get that down to say 1-2 cores and 1-2 GB ram? That’s going to be painfully expensive fast during expected peaks around 10,000+ simultaneous room recording.

Virtually impossible - not even with Jibri running at 720p. FFmpeg and Chromedriver alone will ravish all that.

Oh that is gong to be fun to explain.
Alright, I’ll work that into the scope, there will definitely be some client grumbling about that, they were already a little grumbly about the JVB quantity. I’ll have to see how much the recording rooms are going to add up to in AWS costs.
Thanks everyone, as usual you are all so very helpful. I greatly appreciate it.
When I get the testing environments for these components setup (soon), I’ll follow up with additional questions on this topic here, as I try to figure out the scaling of these two areas.
(unless you would prefer I close this out and spin up new thread for each then?)

I’d be careful here in the JIBRI planning. Check Ffmpeg eats all the memory and crash within a minute - recording or streaming · Issue #269 · jitsi/jibri (github.com) - this is a long running issue on various Hypervisors.
As far as JIGASI / Closed Captioning is concerned depends on the options you want to enable too, as you can record the audio stream (wav), create a transcript etc, so scalability / instances etc will need some testing based on requirements.

1 Like

Yep, that’s the plan. Have multiple testing environments setup to try to figure out the firmer numbers for capacity planning.
Thanks to everyone for the suggestions. I’ll post updates as the testing proceeds.
Happy Jitsi-ing!

Following up on this capacity question. Anybody has any advice on the kind of AWS instance is best suited to run the VOSK server (to be used with Jigasi)? Like C5? R5? M family?
Thanks in advance

1 Like

You’d do better to ask the developer (Vosk) directly. Check out his github page and contact him there. He usually responds.

1 Like

Yes @Freddie
I got a response from them a C5 large for few parallel request should be ok

1 Like