[android] Headless voice call with Jitsi Meet SDK


#1

So… I asked about this in the Jitsi Community Call a few weeks ago. We have a use case like this (it’s an imaginative use case, but bear with me):

  • It is an app targeted at elderly people, who (according to the defined Persona) are not tech savvy.
  • There is a feature in the app that allows the user to press a button in the app and gets a voice call established with someone to assist them.
  • The user should be able to continue with what they are doing (without having to switch to a dedicated call UI)
  • The call needs to continue if the app goes into the background (e.g. when the user presses home button)

After a bit of experiment, I got something to work, although a bit hacky.

First, I noted that:

  • The Jitsi Meet View (or React Native View) does not need to be attached to an activity in order to function (i.e. setContentView is not neccessary). In fact, as soon as the JitsiMeetView instance is created, the JS code starts executing.
  • However, React Native does need an Activity context to initialize (which is not a problem, since when we initiate the call, the user will always be on some activity in the app). The lifecycle methods (onHostResume, onHostPause etc.) also needs an active activity context, and will crash if they are called without a valid one, which a bit trickier. In other words, both the creation and destruction of JitsiMeetView need to happen when there is an active Activity.
  • The only reason why Jitsi Meet leaves the conference when the app goes back into the background is because onHostPause is called. If it doesn’t get called, React Native will continues to run, as long as the Android application process doesn’t get killed. The official Jitsi Meet Android app solves this by using Picture-in-Picture mode to keep the app in the foreground, but this is not the only way.
  • Additionally, simply creating the JitsiMeetView instance (with URL set to "") will occupy camera and microphone resource. Because our app has other functionalities that need access to camera and microphone, JitsiMeetView instance should only be created when there is a need for a call, and should be destroyed after the call.

Then:

  • I created a class, JitsiMeetHolder, a singleton initialized on Application start, that creates and holds the only Jitsi Meet View instance, and serves as an interceptor of lifecycle events.
  • The Activity's pass their onStart/onStop lifecycle events to JitsiMeetHolder. JitsiMeetHolder will suppress the onPause event if there is an ongoing call. It will also note that the activity is “dead”, so any further requests (from the business logic of the container app) to start/stop a Jitsi Meet call will be “queued” until a new Activity becomes active.
  • When any Activity of the app comes into the foreground. JitsiMeetHolder checks for any pending requests, and then create/destroy the JitsiMeetView instance using the new Activity context.
  • Whenever there is an ongoing Jitsi Meet call, I will launch a Service (Application.startForegroundService()), which does not perform any real tasks, just to inform the Android system not to kill our app.

The (partial) code is available at https://gist.github.com/ztl8702/7e1d2ec455727ef8d4127dc8ef53ccbf

I am sure if I look deeper into the React Native source code, I might be able to come up with a more elegant solution. But this approach that I described above has worked pretty well for us.


Update: Here is a screen recording demonstrating this “headless Jitsi call” feature in action:

The initiation (accepting incoming calls) is handle by our application logic. Our application server generates a unique URL for each call.

The initiator of the call (emulator on the left) joins the Jitsi Meet conference as it is waiting for the other to accept the call; the other client receives the Jitsi conference URL from the application server, and joins it after accepting the call.

The browser on the left is in the same Jitsi conference. You can see the two clients joining and leaving the conference.


#2

One glitch we have encountered so far is that Jitsi Meet SDK always turns the full screen mode on when conference joins, even though I have overridden #config.startAudioOnly=true in the URL passed to JitsiMeetView::loadURLString. (It should not behave like this according to the code here: https://github.com/jitsi/jitsi-meet/blob/bfdfb5321c5744b10f9b75411a5530f27e74282b/react/features/mobile/full-screen/middleware.js#L53)

I will investigate if this is a bug in Jitsi Meet, and will lodge a bug report if it is.


Another minor issue is that Jitsi Meet SDK sometimes don’t trigger the Android request permission dialogs (it just silently fails, and joins the conference as muted).

We decided to handle microphone and camera permissions in the container app. But, it would be nice if there is a java API for the container app to query the audio/camera stats (to see which tracks are currently active).


#3

And I was just thinking, if I were to implement this headless voice chat in a production app, I should probably wrap lib-jitsi-meet in a react native instance directly, to lower the overhead. And then it may be possible to run it in a background service.

But for now I am happy with what Jitsi Meet Android SDK can achieve.