[jitsi-dev] [jitsi-videobridge] Preserving stream quality statistics (#79)


#1

This is long. Sorry. Bear with me. Feel free to tell my, "you're wrong" about any part of this.

## Background

We're rolling out a deployment of jitsi (awesome stuff!). We want to be able to troubleshoot audio/video quality problems post-mortem. In other words, we want users to be able to report problems after the conference is over, and we as engineers can look at detailed statistics to determine what the problem is, and hopefully, craft a solution.

## Possible sources of statistics

* Statistics reported to InfluxDB after each channel is closed (i.e. the channel_expired series). -- This is good to spot big / general problems, but it's difficult to reconstruct what actually happened. For instance, we want to be able to see things like packet loss _over time_. We hope this can distinguish between things like a continually crappy network, and a network that was fine and then at some point got measurably worse.
* RTCP data broadcast to participants in the conference. -- This isn't currently preserved anywhere, but if we were to save off a pcap file (or similar) of the RTCP data stream, we should be able to reconstruct most, if not all, of the information we need from that.

We're currently investigating adding support to Jitsi-videobridge for saving off the complete set of RTCP packets for a conference.

## Possible solutions

* Add another hidden participant to the conference, ala JiCoFo, that records these statistics. As I understand, all participants should generally still get all the RTCP data, so this should work. (Correct me if that's not the case)
* Record the statistics client-side (via browser APIs) and report them separately to another server. This has the downside of duplicating data that, as I understand it, is already reported to the videobridge via RTCP.
* Inject code into jitsi-vidoebridge itself to save off the RTCP data to a file.

We've tentatively gone down that third path. Two questions:
* Is this of general interest, and thus likely to be accepted as a PR; OR
* Is there a good way of doing this without modifying jitsi-videobridge itself?

## Implementation

What follows is *very* hacky, largely untested, and not at all final.

So far, I've enabled `BasicBridgeRTCPTerminationStrategy` via configuration, and added a new `DetailedStatsSerializer implements Transformer<RTCPCompoundPacket>`, which I add to the list of rtcp transformers (via `setTransformerChain`.

The `DetailedStatsSerializer` saves all of the rtcp packets it sees to a file, one file per conference. We have a separate system monitoring the directory in question and uploading files to Amazon S3 for storage.

We'll build a separate analysis tool to summarize and visualize the information when needed.

···

-----

I'm looking for general feedback, of any sort.

* Am I way off track?
* Is there interest from other people in this sort of functionality?
* Are there better ways to get this information?

Sorry for the novel. Thanks for reading.

So. Thoughts?

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/79


#2

Hi Joshua,

Our mailing list (dev@jitsi.org) would be a better place for this discussion. Some quick replies below.

Add another hidden participant to the conference, ala JiCoFo, that records these statistics. As I understand, all participants should generally still get all the RTCP data, so this should work. (Correct me if that's not the case)

This won't necessarily work in all cases, because of the bridge terminating RTCP.

    Record the statistics client-side (via browser APIs) and report them separately to another server. This has the downside of duplicating data that, as I understand it, is already reported to the videobridge via RTCP.

This option is more flexible then the two others, since you can get more information than what is contained in RTCP and you can control the timing yourself. We already have some work towards such an approach. Two approaches, actually:
1. Including logs from the clients to influxdb: https://github.com/jitsi/jitsi-meet/blob/master/doc/influxdb.md
This mostly works, but the user interface is not really in a useable state.
2. Using callstats.io: http://www.callstats.io/

    Inject code into jitsi-vidoebridge itself to save off the RTCP data to a file.

We've tentatively gone down that third path. Two questions:

    Is this of general interest, and thus likely to be accepted as a PR; OR

I don't think we are likely to use these for the purpose of gathering statistics, but I think it would be a useful think to have for debugging. So, I'm not sure if we can merge this, but a PR would be welcome.

We have some debugging code which essentially does the same thing (although it is plugged in in another place).

    Is there a good way of doing this without modifying jitsi-videobridge itself?

I can't think of one. You need access to the decrypted RTCP packets.

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/79#issuecomment-129481341


#3

Thanks!

Our mailing list (dev@jitsi.org) would be a better place for this discussion.

Yep, historically, that's how I'd interact. However, I've personally found that managing mailing list subscriptions for the tens of open-source projects I've occasionally contributed to is a huge pain in the ass, and this is a low-barrier-to-entry solution for me. I don't know what the "right" solution is, or even if such a thing exists. Thoughts?

This [record stats via client-side browser APIs] option is more flexible then the two others, since you can get more information than what is contained in RTCP and you can control the timing yourself.

Huh. From what I've read of the RTCP spec, we can fairly easily extrapolate all of the stats are available client-side, using only the RTCP data - with the caveat that we can only do so at whatever frequency the clients are reporting RTCP stats up, which we don't have control over. All the senders should be reporting packets sent for each SSRC, and all receivers should be reporting packets dropped for each SSRC they receive (as of particular last-packet-id). I made the intuitive leap (which I haven't verified) that we should be able to basically just subtract those to get cumulative packets received, from which we can compute all the stats we could want. Is that not the case?

[With the client-side solution] you can control the timing yourself.

In some ways, I think not being able to control the timing of received stats is a _plus_, since lots of work has already gone into making sure that the RTCP data doesn't overwhelm the much more important RTP data. Implementing it myself, either over the datachannel or perhaps websockets, I'd be worried about accidentally choking off the RTP data. Not that I can't reason my way through that pretty easily (i.e. convince myself that my solution won't choke off the RTP) - just that, with RTCP, I don't even have to think about that. Does that make sense?

I'm not sure if we can merge this, but a PR would be welcome.

I'll try to keep it as light-weight as possible. I assume that's the best approach to actually getting it merged. I'm not a huge fan of maintaining internal (or public) forks of open-source projects, so hopefully that's not necessary.

I can't think of one [a way to do this without modifying jitsi-videobridge]

One thing that briefly crossed my mind was to add a new RTCPTerminationStrategy - but one that's not in jitsi-videobridge. We just add that to the classpath via the startup command (which we're customizing right now _anyway_), and configure it via sip-communicator.properties. Does that sound like it's feasible? Is that a better strategy, in your opinion?

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/79#issuecomment-129524360


#4

Two things I missed:

1. Including logs from the clients to influxdb: https://github.com/jitsi/jitsi-meet/blob/master/doc/influxdb.md

This mostly works, but the user interface is not really in a useable state.

Yeah, I've been playing with the influx integration a bit (which we already have configured internally), and even I have some jitsi-videobridge patches locally to upgrade to 0.9.2 (the latest). However, I've thus far been disappointed in the stability and queriability (i.e. the sorts of queries it'll let you run). We're not getting rid of it, though. I'm hopeful it'll get better with time.

2. Using callstats.io: http://www.callstats.io/

Is this viable, given the RTCP data is encrypted? Won't we have the same problem you eluded to as with adding another pseudo-participant?

Thanks for the pointer, though. I'll definitely take a look.

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/79#issuecomment-129528500


#5

Callstats doesn't use RTCP. It plugs into the webrtc statistics the browser provides and publishes reports to their servers.

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/79#issuecomment-129554185


#6

Sigh. It looks like the RTCP data does not, in fact, have all the data we care about. In particular, it doesn't distinguish between the audio and video stream of a single participant.

We'll be looking into some alternate solutions (including callstats). Thanks for the pointers!

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/79#issuecomment-130335342


#7

Closed #79.

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/79#event-380408664


#8

The SSRCs for each participant (including the content type (audio or video)) is signaled in one way or another to both jicofo and videobridge, so you can combine logs from there with the RTCP dumps.

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/jitsi-videobridge/issues/79#issuecomment-130356106