I am looking to capture the sound of a VoIP call made with JITSI on my
desktop, to stream it (real time) to a Cloud Speech-to-text Websocket
API. Any idea if this is possible ?
We don't have any plans to support anything like this in the Jitsi desktop client. The project you refer to is based on our web client -- Jitsi Meet. See below for more details.
I found this enticing sentence: "*Speech-to-text in Jitsi Meet* -
Integrate one of the available speech-to-text APIs with Jitsi Meet to
create a transcript of a conference"
For a Proof-of-Concept project, we need real time capture of a VoIP
phone conversation made with JITSI. The necessary steps are:
- Call event recognition (made, received or - best - both)
- Start capturing the call sound, both sides of the conversation (not
interrupting or disrupting the call)
- Direct the sound stream to the Websocket API
- Collect the API text result back
Alternatively, Java, C++ or ActionScript (in Adobe AIR)
Last year Nik Vaessen worked on speech to text in Jitsi Meet, and while a complete solution isn't available yet you might be able to use parts of his work. This module in lib-jitsi-meet records audio for the conference, and after the conference ends calls in to a custom transcription server which does the speech to text conversion. So transcription in real-time is not supported.
This summer Nik will continue his work, but so far our plan is to move in a different direction and implement it as part of jigasi (so in java). We aim to support transcription in real-time.
Depending on your needs you can either work with what we have right now (which we don't plan to support in the future), or wait for a few months for the new version to be out. We would like to hear more about your use case, since we are still in the design stage of the new system and would ideally like to support different use cases.
On 20/05/2017 19:15, Franck MIKULECZ wrote: