Audio and video analysis for individual meeting participant streams

Trying to isolate the stream/track (video and audio) for each individual participant in a meeting in order to send each stream to external audio and video analytics processing in real-time. Note: these analytics already exist externally (not Google), but we need to have access to each stream. Looks like transcription has been handled in previous questions (Jitsi API for accessing streaming data), but that seems to assume Google API and does not do all the types of analytics we have in mind.

Example usage: sentiment analysis and visual recognition for each user individually.

Looked like there were a few places where this could be done (videobridge, SIP, jigasi or even via API client , but seeking recommendation on best/easiest approach.

Eventually will assume Android OS client participants, but can include browser or desktop client additionally.

What you are looking for is jigasi. You can create a transcription implementation and side load it on installed jigasi.
Here is how to load that custom implementation: