[jitsi-dev] How to "Capture" JITSI call sound from an external App (and stream it)


#1

Hi Guys,

I am looking to capture the sound of a VoIP call made with JITSI on my desktop, to stream it (real time) to a Cloud Speech-to-text Websocket API. Any idea if this is possible ?

I found this enticing sentence: "*Speech-to-text in Jitsi Meet* - Integrate one of the available speech-to-text APIs with Jitsi Meet to create a transcript of a conference"
on https://jitsi.org/GSOC16/Web

More details:
For a Proof-of-Concept project, we need real time capture of a VoIP phone conversation made with JITSI. The necessary steps are:
- Call event recognition (made, received or - best - both)
- Start capturing the call sound, both sides of the conversation (not interrupting or disrupting the call)
- Direct the sound stream to the Websocket API
- Collect the API text result back

Ideally, we would rather use Javascript code in an opened Web UI. Alternatively, Java, C++ or ActionScript (in Adobe AIR)

Thanks for any info...


#2

Hi Franck,

Hi Guys,

I am looking to capture the sound of a VoIP call made with JITSI on my
desktop, to stream it (real time) to a Cloud Speech-to-text Websocket
API. Any idea if this is possible ?

We don't have any plans to support anything like this in the Jitsi desktop client. The project you refer to is based on our web client -- Jitsi Meet. See below for more details.

I found this enticing sentence: "*Speech-to-text in Jitsi Meet* -
Integrate one of the available speech-to-text APIs with Jitsi Meet to
create a transcript of a conference"
on https://jitsi.org/GSOC16/Web

More details:
For a Proof-of-Concept project, we need real time capture of a VoIP
phone conversation made with JITSI. The necessary steps are:
- Call event recognition (made, received or - best - both)
- Start capturing the call sound, both sides of the conversation (not
interrupting or disrupting the call)
- Direct the sound stream to the Websocket API
- Collect the API text result back

Ideally, we would rather use Javascript code in an opened Web UI.
Alternatively, Java, C++ or ActionScript (in Adobe AIR)

Last year Nik Vaessen worked on speech to text in Jitsi Meet, and while a complete solution isn't available yet you might be able to use parts of his work. This module in lib-jitsi-meet[0] records audio for the conference, and after the conference ends calls in to a custom transcription server[1] which does the speech to text conversion. So transcription in real-time is not supported.

This summer Nik will continue his work, but so far our plan is to move in a different direction and implement it as part of jigasi[2] (so in java). We aim to support transcription in real-time.

Depending on your needs you can either work with what we have right now (which we don't plan to support in the future), or wait for a few months for the new version to be out. We would like to hear more about your use case, since we are still in the design stage of the new system and would ideally like to support different use cases.

Regards,
Boris

[0] https://github.com/jitsi/lib-jitsi-meet/tree/master/modules/transcription
[1] https://github.com/jitsi/Sphinx4-HTTP-server
[2] https://github.com/jitsi/jigasi

ยทยทยท

On 20/05/2017 19:15, Franck MIKULECZ wrote: