[jitsi-dev] Jicofo SPoF


#1

Hello All,

I am fairly new to jitsi and trying to understand various pros and cons of
the architecture. Specifically, I am trying to understand how to run jitsi
in a production environment. It's great that jicofo can load balance across
multiple videobridges, but jicofo itself seems to be a single point of
failure. Has there been any work to remedy this issue?

Thanks,
-Ghulam


#2

Hey Ghulam,

Indeed. We run a number of shards (shard=1prosody + 1jicofo +
Nbridges) in production and we loadbalance between them with HA proxy.
The HA proxies use Jicofo's REST API to run health checks and block
dead ones if necessary.

Emil

···

On Tue, Sep 19, 2017 at 8:15 PM, Ghulam Memon <gmemon.work@gmail.com> wrote:

Hello All,

I am fairly new to jitsi and trying to understand various pros and cons of
the architecture. Specifically, I am trying to understand how to run jitsi
in a production environment. It's great that jicofo can load balance across
multiple videobridges, but jicofo itself seems to be a single point of
failure. Has there been any work to remedy this issue?

Thanks,
-Ghulam

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

--
https://jitsi.org


#3

Running multiple shards is not a feasible solution to us. What I want is a “cluster” like Jicofo. I would like run this idea by the community. Currently Jicofo is implemented as a component, so no more than one instance can be run against an XMPP server. We are thinking to use Hazelcast to form a cluster group. All instances of Jicofo connecting to one XMPP will share data among themselves using Hazelcast, but only the “senior” member in the cluster group will have its component started. If this senior member instance dies, Hazelcast will promote a next instance in the cluster group to a senior member. The new senior member then will have its component started. I understand that the code may not be structured in a way that we can separates the component start easily. But, will this approach work? If this approach is reasonable, how should I proceed? Any comments are welcome.

Another idea is not to use component at all; just to have the “focus” user logs in each Jicofo instance with different resource part in the JID (i.e. focus@mydomain.com/some-uuid) All IQ’s to Jicofo address to the bare JID of “focus” user (to="focus@mydomain.com"). This way, all “focus” sessions get the request, but only the senior member (maybe using a MUC to determine it) of Jicofo will handle the IQ request. All other “focus” sessions are dormant until one becomes a senior member. This approach uses XMPP routing capability in the cluster mode; however, it requires changes to the current architecture. If it is a viable approach, it probably will be a longer term goal.

Please bear with my naive ideas.


#4

I know clustering is something we were discussing at some point, but it was just discussions and there is no such functionality on the roadmap.
Why you see running multiple shards a problem? This is how currently meet.jit.si is running in multiple regions and is running just fine. Failover is working fine as well as auto-scaling.


#5

We are aware of the shards that can handle multiple regions. But we try to preserve the conference sessions within a shard (XMPP, JVB, Jibri, Jicofo.) We will use clustered XMPP servers for high-availability, multiple instances of JVB for failover/load-balanced and multiple Jibri (still need a scalable solution) to support hundreds of concurrent recordings. Based on this design, Jicofo becomes a single point of failure within a shard and we would like to avoid. Does it make sense?


#6

Yep, when jicofo fails the shard is marked as unhealthy and everything is moved to another shard and no new conferences go to unhealth shard till it is recovered one way or another. The point of failure is taken care of.


#7

I am not aware that a shard will be marked unhealthy and everything will be moved automatically. Is there another module that we need to install for this shard migration? With this shard migration, can a clustered XMPP still be used? Where can I read more information about this shard capability? I really appreciate your information.


#8

There is no document about that, sorry. I’m explaining how meet.jit.si is currently working. These shard health/unhealthy is just using haproxy and its features.
Have no idea how clustering will work, how you will implement clustering with prosody, what changes there will be in jitsi-meet, jicofo and jvb and with all these you risk diverging a lot from upstream and make it unmaintainable and not able to upgrade from upstream.


#9

It is good information. Thank you.

We plan to use a clustered Openfire and nginx to a web server to host our “meet” UI codes. We don’t plan to make any changes to jvb. Prior to knowing haproxy’s capability, we originally plan to add a “cluster manager” to jicofo that only a senior member in the cluster will have FocusComponent started and handle the IQ. The rest of the members are dormant until a member becomes a senior member. The senior member will use the cluster manager to broadcast its internal state to all dormant members. If the senior member exits, another member will become a senior member. For the cluster manager implementation, it is possible to use a MUC room. We try to minimize any changes to existing Jitsi codes and architecture, but if changes are necessary, we’ll try to isolate them as best as we can.