Using Jitsi + Jibri for a social psychology study: is it possible?

Hi everyone,

After COVID-19 shut down all in-person data collection at my university, a lot of researchers have been scrambling trying to figure out whether they can move their studies online. A friend in the social psych department was planning on doing a project looking at changes in vocal pitch and facial expression during two-person interaction, and is currently looking into whether it’s possible to collect her data via video conferencing.

The catch is that she needs passable quality recordings of the video and audio streams for both participants individually so they can be run through some facial feature and vocal algorithms prior to analysis, and she can’t easily rely on participants to install screen recording software, use it properly, and then send over the resulting files. As such, she’s looking for a solution where the video conference server records the separate video/audio streams. In my searches, Jitsi and Jibri were some of the first tools that came up, but I want to be sure I’m understanding them and their capabilities properly before looking too much into setting this up:

  1. When Jibri records a meeting, is it possible to isolate the video/audio for each participant in the call?

  2. If not, is it possible to configure two separate Jibri instances so each focuses on only one participant in the call (i.e. doesn’t keep switching between the video feeds for them both)?

  3. Given the way Jitsi is written, would it be impossible to modify the code somewhere so it records the video/audio streams being broadcast by each participant to a file? That seems like a much easier solution with better quality results than having a separate server that records the framebuffer of a virtual Chrome session logged into the call, but it doesn’t seem to be a feature. Given that there’ll only ever be two participants in a call and there likely won’t be any concurrent calls, the CPU and/or disk I/O demands of this aren’t a big concern.

In other words, is Jitsi (+ Jibri) a viable solution for my friend’s use case, or should we look elsewhere for a solution?

Thanks in advance!

  • Austin

Hello,

This sound like a really cool project.

Unfortunately not, it just grabs the video off of the framebuffer and audio out of the audio device post-mixing.

Not out of the box, but it is doable. The jitsi-meet client (running in jibri) already has per-participant volume control, so splitting the audio should be a matter of finding the correct API. Of course, you will need to run your own jitsi-meet instance and modify jicofo (to invite more than one jibri) and jibri.

We used to have a project that recorded this way. While it has its advantages, the solution is not as easy as one might think. The server has to understand how to put RTP packets into video frames, and handle packet loss, and hande audio/video synchronization as well as synchronization between participants. A separate post-processing step is required to merge the streams (which means the stream in not immediately available for streaming). Because of this we moved away from this solution.

You may be able to get it to work using jitsi-videdobridge 1.0[0], but I expect there would be hurdles along the way (e.g. the signaling needed to enalbe the feature is no longer a part of jitsi-meet/jicofo) and we won’t be able to provide any help. I would advice you to go with your dual-jibri idea instead.

Regards,
Boris

[0] GitHub - jitsi/jitsi-videobridge at jvb_1.0

Hi @Boris_Grozev
I’ve a similar project and I would like to capture somehow the webcam stream from each participant, client side (so that I can add my js own lib and rebuild). Do you think it is possible?