JVB fails on load test

Yeah, already on latest, no changes. Even moved jvb to same instance where meet is - same error.

Then maybe jvb is timing out replying …

Yes, I noticed that we have time out errors even with 30 users in 10 rooms - in that case users spawn and start to produce video before first error appear in log so conferences seems to still work after that error. But with more users that further leads to “unable to place” user errors and so on.

Well, I’ve switched to hetzner bare metal instance and instantly jitsi start to work as expected. Was able to handle 500 participants with audio&video using meet&jvb on single bare metal server (4core/8threads&64ram). Now wondering what is wrong with my aws setup where instance with same specs, even with dedicated tenancy couldn’t handle even 50 users in 10 rooms…

Well, after spinning up docker setup on aws, it worked like a charm and could handle stable 100 users with video without any tweaking. So, something is still wrong with configs - they work on bare metal, with public ip but not in aws with private… Tried switching kernels/adding org.ice4j.ice.harvest.NAT_HARVESTER_LOCAL_ADDRESS/PUBLIC_ADDRESS but no luck.
Also noticed that both on bare metal/docker in aws clients video is working instantly, but in regular setup it doesn’t load for some clients and we have this error in log: no jingle session

Jicofo 2021-10-25 17:44:08.774 WARNING: [16] [room=loadtestroom8@conference.example.com] JitsiMeetConferenceImpl.lambda$propagateNewSources$7#1444: No jingle session yet for a89215e5
Jicofo 2021-10-25 17:44:08.774 WARNING: [16] [room=loadtestroom8@conference.example.com] JitsiMeetConferenceImpl.lambda$propagateNewSources$7#1444: No jingle session yet for 3046fe41
Jicofo 2021-10-25 17:44:10.806 WARNING: [16] [room=loadtestroom2@conference.example.com] JitsiMeetConferenceImpl.lambda$propagateNewSources$7#1444: No jingle session yet for 682b2794
Jicofo 2021-10-25 17:44:10.806 WARNING: [16] [room=loadtestroom2@conference.example.com] JitsiMeetConferenceImpl.lambda$propagateNewSources$7#1444: No jingle session yet for fc343629
Jicofo 2021-10-25 17:44:10.806 WARNING: [16] [room=loadtestroom2@conference.example.com] JitsiMeetConferenceImpl.lambda$propagateNewSources$7#1444: No jingle session yet for e2b88834
Jicofo 2021-10-25 17:44:11.815 WARNING: [16] [room=loadtestroom2@conference.example.com] JitsiMeetConferenceImpl.lambda$propagateNewSources$7#1444: No jingle session yet for fc343629
Jicofo 2021-10-25 17:44:11.816 WARNING: [16] [room=loadtestroom2@conference.example.com] JitsiMeetConferenceImpl.lambda$propagateNewSources$7#1444: No jingle session yet for e2b88834
Jicofo 2021-10-25 17:44:11.825 WARNING: [16] [room=loadtestroom7@conference.example.com] JitsiMeetConferenceImpl.lambda$propagateNewSources$7#1444: No jingle session yet for 65f627c9
Jicofo 2021-10-25 17:44:11.825 WARNING: [16] [room=loadtestroom7@conference.example.com] JitsiMeetConferenceImpl.lambda$propagateNewSources$7#1444: No jingle session yet for f051ab76
Jicofo 2021-10-25 17:44:11.825 WARNING: [16] [room=loadtestroom7@conference.example.com] JitsiMeetConferenceImpl.lambda$propagateNewSources$7#1444: No jingle session yet for daac8493
Jicofo 2021-10-25 17:44:12.600 WARNING: [58] FocusManager.conferenceRequest#244: Exception while trying to start the conference
Jicofo 2021-10-25 17:44:14.832 WARNING: [16] [room=loadtestroom10@conference.example.com] JitsiMeetConferenceImpl.lambda$propagateNewSources$7#1444: No jingle session yet for 8c552146
Jicofo 2021-10-25 17:44:14.832 WARNING: [16] [room=loadtestroom10@conference.example.com] JitsiMeetConferenceImpl.lambda$propagateNewSources$7#1444: No jingle session yet for 3262619c
Jicofo 2021-10-25 17:44:15.617 WARNING: [59] AbstractParticipant.setChannelAllocator#349: Canceling ParticipantChannelAllocator[BridgeSession[id=80902840_10a53b, bridge=Bridge[jid=jvbbrewery@internal.auth.example.com/d01d3526-b216-5958-be76-7dde6f2465ae, relayId=172.31.7.156:4096, region=eu-west-1, stress=0.26]]@793459891, Participant[loadtestroom3@conference.example.com/68b52329]@65310048]@596049510
Jicofo 2021-10-25 17:44:15.617 WARNING: [59] AbstractParticipant.setChannelAllocator#349: Canceling ParticipantChannelAllocator[BridgeSession[id=80902840_10a53b, bridge=Bridge[jid=jvbbrewery@internal.auth.example.com/d01d3526-b216-5958-be76-7dde6f2465ae, relayId=172.31.7.156:4096, region=eu-west-1, stress=0.26]]@793459891, Participant[loadtestroom3@conference.example.com/063b37ac]@1820071842]@317668379
Jicofo 2021-10-25 17:44:15.617 WARNING: [59] AbstractParticipant.setChannelAllocator#349: Canceling ParticipantChannelAllocator[BridgeSession[id=80902840_10a53b, bridge=Bridge[jid=jvbbrewery@internal.auth.example.com/d01d3526-b216-5958-be76-7dde6f2465ae, relayId=172.31.7.156:4096, region=eu-west-1, stress=0.26]]@793459891, Participant[loadtestroom3@conference.example.com/6b7e8bd9]@749084309]@1472766072
Jicofo 2021-10-25 17:44:15.618 WARNING: [59] AbstractParticipant.setChannelAllocator#349: Canceling ParticipantChannelAllocator[BridgeSession[id=80902840_10a53b, bridge=Bridge[jid=jvbbrewery@internal.auth.example.com/d01d3526-b216-5958-be76-7dde6f2465ae, relayId=172.31.7.156:4096, region=eu-west-1, stress=0.26]]@793459891, Participant[loadtestroom3@conference.example.com/003c6d03]@1187033117]@1135149353
Jicofo 2021-10-25 17:44:15.618 WARNING: [59] BridgeSelectionStrategy.select#111: Failed to select initial bridge for participantRegion=eu-west-1
Jicofo 2021-10-25 17:44:15.619 WARNING: [59] BridgeSelectionStrategy.select#111: Failed to select initial bridge for participantRegion=eu-west-1
Jicofo 2021-10-25 17:44:15.619 WARNING: [59] BridgeSelectionStrategy.select#111: Failed to select initial bridge for participantRegion=eu-west-1

Any idea what it could be?

So you try docker on aws and bare metal and you see the difference, and the aws and bare metal are with similar CPU and RAM specifications, what aws instance type is that? Maybe it’s something that needs to be tuned about docker running on VM? I have zero experience with docker, but it seems like that’s the problem maybe.
And the OS version on both is the same?

OS was ubuntu 18.04 in all cases. Yes, seems that docker setup has maybe something enabled by default, which is missing in my case on default non-docker setup. I was using c5a.xlarge for docker setup and it was able to handle 100 connections, even with 100% cpu load - not a single conference went down.
I also connected JVB from docker to main jicofo/prosody on instance, but had the same error as always. So it looks like some prosody misconfiguration…? Checked my config multiple times but I don’t see anything special, maybe you will notice.

root@jitsi-meet-dev:~# cat /etc/prosody/conf.avail/examle.com.cfg.lua 
plugin_paths = { "/usr/share/jitsi-meet/prosody-plugins/" }

-- domain mapper options, must at least have domain base set to use the mapper
muc_mapper_domain_base = "example.com";

external_service_secret = "3...N";

external_services = {
  { type = "stun", host = "example.com", port = 3478 },
  { type = "turn", host = "example.com", port = 3478, transport = "udp", secret = true, ttl = 86400, algorithm = "turn" },
  { type = "turns", host = "example.com", port = 5349, transport = "tcp", secret = true, ttl = 86400, algorithm = "turn" }
};

asap_accepted_issuers = { "example_app_id" };
asap_accepted_audiences = { "example_app_id" };

cross_domain_bosh = false;
consider_bosh_secure = true;
component_interface = "0.0.0.0";
-- https_ports = { }; -- Remove this line to prevent listening on port 5284

-- https://ssl-config.mozilla.org/#server=haproxy&version=2.1&config=intermediate&openssl=1.1.0g&guideline=5.4
ssl = {
  protocol = "tlsv1_2+";
  ciphers = "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384"
}

VirtualHost "example.com"
        -- enabled = false -- Remove this line to enable this host
        authentication = "token"        -- Properties below are modified by jitsi-meet-tokens package config
        -- and authentication above is switched to "token"
        app_id = "example_app_id";
        app_secret = "jkj...Hs";
        allow_empty_token = false;
        -- Assign this host a certificate for TLS, otherwise it would use the one
        -- set in the global section (if any).
        -- Note that old-style SSL on port 5223 only supports one certificate, and will always
        -- use the global one.
        ssl = {
                key = "/etc/letsencrypt/live/example.com/privkey.pem";
                certificate = "/etc/letsencrypt/live/example.com/fullchain.pem";
        }
        speakerstats_component = "speakerstats.example.com"
        conference_duration_component = "conferenceduration.example.com"
        -- we need bosh
        modules_enabled = {
            "bosh";
            "pubsub";
            "ping"; -- Enable mod_ping
            "speakerstats";
            "external_services";
            "conference_duration";
            "muc_lobby_rooms";
            "muc_size";
        }
        c2s_require_encryption = false
        lobby_muc = "lobby.example.com"
        main_muc = "conference.example.com"
        muc_lobby_whitelist = { "recorder.example.com" } -- Here we can whitelist jibri to enter lobby enabled rooms

VirtualHost "recorder.example.com"
  modules_enabled = {
    "ping";
  }
  authentication = "internal_plain"

Component "conference.example.com" "muc"
    storage = "memory"
    modules_enabled = {
        "muc_meeting_id";
        "muc_domain_mapper";
        "token_verification";
        "token_moderation";
    }
    admins = { "focus@auth.example.com" }
    muc_room_locking = false
    muc_room_default_public_jids = true

-- internal muc component
Component "internal.auth.example.com" "muc"
    storage = "memory"
    modules_enabled = {
      "ping";
    }
    admins = { "focus@auth.example.com", "jvb@auth.example.com" }
    muc_room_locking = false
    muc_room_default_public_jids = true

VirtualHost "guest.example.com"
    authentication = "anonymous"
    c2s_require_encryption = false

VirtualHost "auth.example.com"
    ssl = {
        key = "/etc/letsencrypt/live/example.com/privkey.pem";
        certificate = "/etc/letsencrypt/live/example.com/fullchain.pem";
    }
    authentication = "internal_plain"

Component "focus.example.com" "client_proxy"
    target_address = "focus@auth.example.com"

Component "speakerstats.example.com" "speakerstats_component"
    muc_component = "conference.example.com"

Component "conferenceduration.example.com" "conference_duration_component"
    muc_component = "conference.example.com"

Component "lobby.example.com" "muc"
    storage = "memory"
    restrict_room_creation = true
    muc_room_locking = false
    muc_room_default_public_jids = true

What about the main prosody config? Do you have epoll enabled and tcp_backlog adjusted like in docker:

I tested with enabled epoll previously, buy only now I noticed in prosody log that it’s not being loaded in my case:

root@jitsi-meet-dev:~# Oct 27 08:19:49 net.server	error	libevent not found, falling back to select()

Tried to fix with
apt install libevent-dev lua-event
but didn’t helped. Not much cases in google as well…

Ah, installed it with luarocks install luaevent and it fixed the error.
And also it seems that finally I don’t have problem with bridges anymore and my meet instance now handles 50 users without problems.
Thanks a lot for your help Damian!