We are running latest jitsi in kubernetes using docker-jitsi-meet. We have several jibri instances joined the jitsi.
When prosody is restarted, jibri’s are trying to reconnect to prosody, but some of them receive during reconnecting the “not-authorized” response from prosody and stop trying to reconnect and will stay in this state indefinitely, while still reporting jibri health status as HEALTHY and IDLE. Querying jibri health status thus doesn’t make it visible. All we can do is going through every jibri log and verify, if there is some “Jibri connected” message. And even then jicofo sometimes doesn’t see the jibri in brewery, but this might be some other issue.
Out of 8 jibris, we have like 5 ok and 3 not joined, on other shard out of 4 jibris, 3 are ok and 1 is not. This happens almost all the time. Restarting jibri will make it to join, no problem.
We now have an alert set up, that counts HEALTHY jibri instances and compares it with jicofo’s jibri_detector_count metric and in case this number is different, alerts person to manually find the jibris that are not connected and manually restart them, which is painfull.
I think the problem is, that somewhere in prosody startup, there is a momemnt, when prosody is receiving requests, but it’s not probably fully started yet (like user db is not loaded, or something like that?) and sends “not-authorized” responses. Again, we are using docker-meet-jitsi, where the init scripts setup all accounts and then start prosody, so everything should be prepared.
My question is, is there a way how to verify on jibri side, whether it is connected to prosody? We would than make a regular check and if not connected, than restart jibri. Of course, the best solution would be to fix the “not-authorized” thing and remove the problem at all.