I intend to run a machine learning model on the frames captured by those who have their camera on every few minutes. Can anyone guide me in a direction where I can make it possible? It does not have to be a readily available feature. Can someone let me know if there is a way I can extract the frames side-by-side as the meet goes on?
Do you want to get the frames of each participant seperately or the frames of the dominant speaker?
I need frames from every participant separately. Like as soon as a participant begins the video, I need to send the frames to the ML model. If it helps, I am making two separate apps, where one app demands video as a compulsion and the other demands no video at all. So I can keep track of which users will start the camera. So, there is no need to constantly wait for someone to start their camera and listen for this action.
gst-meet may help
Thank you! I will look into this!