I’ve also been thinking about this and I came to the opposite conclusion
How does the Conference approach make implementing a jitter buffer awkward?
I was thinking that the problem would be simplified if each jitter buffer were entirely independent and in their own receive pipelines, as opposed to having a group of jitter buffers which would have to be indexed. Doing them in the receive pipeline also means you get the scope of a single endpoint for free, though it’s not clear to me how important that would be. Maybe more important would be the ease with which they could access other stats about the transmitting endpoint (RTT, for example). I don’t think doing this with the Conference approach would be impossible, just not as clear to me how it would flow.
The receive pipeline is part of JMT, and inserting Nodes to it from the bridge violates encapsulation.
I don’t see it this way just because I actually think the pipeline construction probably should be in the bridge, and JMT should just provide the ‘building blocks’ (the nodes) to put a pipeline together. It’s just that in practice this is easier to do in JMT (both because kotlin makes it easier and it may be difficult to construct from the higher level in the bridge because the plumbing might be a pain). I see the point though based on the current state.
It is error prone and might be broken by changes in JMT (when we work on JMT in the future, I would rather not have to consider how other users of the library might be modifying the pipeline). Also note that in order to tap into audio, video and RTCP at least 3 Nodes will need to be inserted.
I wasn’t too worried about this because the Nodes are so self-contained and the basic premise of a node’s function (packets passing through it) is so unlikely to change that I figured, at worst, they’d be looking at only superficial tweaks to their node implementations and perhaps moving where it’s inserted. Also, we don’t need a Node for RTCP as we already have the RTCP event notifier to access incoming RTCP.
For the recording use-case there is no reason to plug into the packets in the middle of the receive pipeline. Receiving the packets after they have been processed by JMT is sufficient, and this can be easily done in
Conference, which receives all of the packets after they have been processed. It is definitely cleaner this way, in the sense that it will be obvious what the code does.
I’m not sure I agree that it’s more obvious, but I generally think what you’ve said here is true. Again I’m a little worried that they may need access to more low-level data that’d be more readily accessible in the receiver, but this is speculation, really, as I’m not sure all of what’s needed. If it comes down to nothing but SRs (which we forward already) then it should be fine. If other stuff is needed then it gets a bit trickier.
Implementing an actual AbstractEndpoint might indeed be awkward, but it is not necessary. The recorder could be implemented as a PotentialPacketHandler, the same way that the OctoTentacle is:
Yes that’s a good idea.
That is, Conference would have a Recorder instance and will feed packets to it like it does to the tentacle. This would be the only modification to existing brdige code that is necessary, everything else would be separate recording code.
This solution will also just work with Octo.
Also a good point. The octo case is worth thinking about in terms of how you guys want to handle it. Would a single bridge be charged with recording all participants (what if this bridge is removed from the call because all of its participants left, but other bridges are still in the call?), or should each bridge handle the recording of its local participants?
Hey Brian and Boris,
Thank you both so much for all your feedback on these two different approaches! It’s actually really helpful to get a clearer sense of how these different strategies differ, their trade-offs, etc. I think the best approach would be for us to prototype both solutions and then see which is cleaner/more stable for our use-case.
To be clear, the conference approach has, without a doubt, a much smaller ‘surface’ and is less likely to be affected by changes in the bridge/JMT so I think Boris’ idea is good, even though I think I’m less concerned about the brittle-ness of the other approach than he is. I think your plan here is a good one. Try both a bit–or even try the conference one and see if it works out ok and if so that’d be great (I’d imagine most of the core recording code would be reusable so wouldn’t be too much throwaway if it didn’t work).
A couple of quick questions: are there different performance characteristics between these two strategies? For instance, do we need to synchronize operations in the conference approach vs the JMT approach? Also, how might the jitter buffer work differ between the two?
What is the end result you guys are looking for? Is it a file with all the audio and video? If so, then you’re going to have multiple threads at play at one point or another. I’d probably look at using a queue somewhere where you can safely add items from N threads and then define your own threading model on the other side of it.