Trying to make sense of audio bandwidth usage (up/down)

Using Jitsi, I’ve been monitoring the data usage via firefox about:networking page.
The websockets gave the following results for 1h20 of audio conference with 2 to 5 people at times:

Websocket Bytes sent Bytes received
meet.jit.si 69675 342368
rtcstats-server.jitsi.net 1742529 0
meet-jitsi-eu-west-xxx.jitsi.net 154619 27294

This would give upload at 0.02 MB/s, which seems huge (roughly 5 times what I would expect) whereas the download is indeed around 0.04 MB/s which is what I would anticipate if only audio from a single user was downloaded at each point in time.
Does this mean that the outgoing audio is sent via a high quality stream (or poorly encoded) to the server where it is then better encoded and selected to be sent, one stream at a time, to all users?

If I’m correct, would it be possible to improve initial audio encoding so that the upload bandwidth is reduced?

The server cannot decode or encode media. Every participant sends media to the bridge which route it to the rest of the participants.

Thanks for the reply, I did not know the bridge worked that way.
Could you explain why upstream sends more data than what is received downstream then, though? I’m really not sure what’s happening there…

I am curious too, how upstream and down stream data bandwidth changes depending on which factors.

This is my understanding based on tests when you have:

    disableSimulcast: false,
    enableLayerSuspension: true,

Jitsi upload and download varies based on your video quality setting and the size of the window/tile that you are viewing which itself is dependent on your screen resolution and/or the number of participants.

For each video quality being viewed by a meeting participant, the person they are viewing is sending a data stream of that resolution. For example, if your video quality is set to HD, and one person is watching in HD then you send one HD video stream, if someone is watching in SD, then you send in SD, and if someone is watching in LD then you send an LD video stream. If three people are watching you in each of these resolutions (HD, SD, & LD), then you would be sending three video streams, one of each of these resolutions.

However for the viewer, they will only receive data streams that can be displayed in their window/tile.

  1. That is if they are viewing in Presentation mode (one large window and the other in small tiles, then they would could be receiving one HD stream and the rest in LD.
  2. For Tile view, it will depend on their screen resolution how many HD tiles they can display on their screen at one time. If they don’t have enough screen space for the tiles to be in HD, then SD could be used, and as the number of meeting participants increase, all tiles become smaller in size, they will be displayed as LD video streams as the viewer’s Jitsi web browser only needs to receive LD streams.

But does this mean that if you are viewing 20 people in Tile view, your download bandwidth is 20 times a single LD bandwidth?. If a single LD video is 230Kbps, then 20x230Kbps is 4.4Mbps. I do not have more than 20 instances to test with, so I don’t know what happens to the Tile view when, say 50 people are in the meeting.

In Tile view, 4K screens should be able display more HD video streams (720p) than a 1K HD screen could, or that of a 1366x768 resolution screen.

I hope someone with more experience/understanding of Jitsi, might comment further.

These are interesting questions but maybe they belong in a separate thread: my post was explicitly about audio-only usage, so the behavior of Tile view and video resolutions are somewhat unrelated :wink: