Interesting and very exciting topic indeed, at least for me as Voice/Collab guy.
Things are evolving fast and from what I can tell softphones and WebRTC also have a significant impact on the QoS architecture of existing and future networks.
The pandemic has accelerated WebRTC adoption and thus the need to shift trust boundaries for prioritized traffic away from the access layer switch towards the client from what I can tell.
A little history for the rookies that will find this posting in the future
First hardphones turned into softphones on the client
Back in the days when most employees still had hardphones on their desk in the office it was rather easy to set the trust boundary to decide which traffic markings can be trusted and which ones not. In the enterprise networks I’ve seen we usually set the trust boundary at the access layer switch. The hardphones and desktop clients would reside each in their own VLAN (one Data VLAN and one Voip VLAN) and sometimes we’d extend the trust boundary to the VoIP hardphone if conditional trust was configured, so VoIP traffic would always be trusted and get priority treatment.
But we’d harldy ever trust the “dirty” data traffic coming from the desktop clients. Except for a hand full of clients that did some time sensitive database requests there usually was no need to trust any of their DSCP markings.
Check page 20 (Trust Boundaries) if you’re interested in this topic.
Well times have changed and over time the hardphones slowly disappeared while softclients such as MSN Messenger, Skype, Jabber, Discord, etc. started to appear on the clients machines. We wouldn’t move our trust boundary just yet as we could do the DSCP marking by means of ACLs that would match on a specific (RTP) port range used only by those dedicated VoIP applications. So we didn’t have to touch/break too much on the network when we moved away from hardphones. Usually we could define a dedicated UDP port range for each realtime application which allowed us to distinguish prioritized traffic from the other traffic on the network.
Then video has moved from dedicated hardware to the client (WebRTC) as well
Most videoconferences however were still carried out through what Cisco calls “collaboration endpoints”. That’s dedicated hardware (Camera/Microphone) in conference rooms specifically built for conferences. People would physically gather in conference rooms and if needed a videoconference would be set up between two or multiple sites through this dedicated hardware. Since those physical devices would sit in their own VLAN we could still rely on the old QoS architecture where we’d have our trust boundary at the access layer. In our case this was the standard until early 2020, when suddenly all the offices were empty from one day to the other due to Covid-19.
More recently we saw a sharp increase in individual, client based video traffic (users doing WebRTC based conferences on their own machines) while those conference rooms where a group of people would gather in one room quickly lost a lot of their former popularity.
This move away from dedicated hardware for videoconferences to software on the client has interesting implications for the network - in particular the QoS architecture - as well.
First we saw phones moving away from dedicated hardware to software on the client. For VoIP we rely on dedicated applications that make use of specific port ranges for the media streams that can be individually configured for each application. We don’t do voice calls through the browser (just yet), we still do most of the voice calling through dedicated applications such as Cisco Jabber.
Now video has very quickly evolved from dedicated hardware into software on the client as well. In our case video has skipped the stage of dedicated software (there is no such thing for video as there is a softphone for voip) and has jumped directly into the browser thanks to WebRTC. This means that we don’t have various dedicated applications that we can assign various port ranges anymore, but only one application - which is the browser that uses the 49152 and 65535 UDP port range, regardless of what WebRTC service has established the media connection(s). Meaning that unlike with voice traffic we don’t have a way to tell and distinguish which traffic should be discriminated and which one should get prioritized treatment based on a port range.
The old approach of having dedicated VLANs for VoIP or Video endpoints doesn’t work anymore and neither do dedicated port ranges do the trick anymore.
We need new solutions to prioritize traffic, now that we have a whole lot of people doing realtime videocalls through their browsers to get their work done.
What’s on the horizon / What network engineers need to know
Luckily the brilliant guys over there at google and the w3c have already thought about solutions.
Google has already implemented their proprietary googDscp feature to mark RTP traffic and the w3c is currently also working on a solution to give WebRTC traffic a DSCP marking.
Both approaches mean that the trust boundary likely needs to be moved away from the access layer switch towards the client itself as the individual applications/services will set their DSCP markings in the future.
So Jitsi being able to mark its media streams with (user defined?) DSCP values definitely is something very exciting that’s definitely going to be used in the future, if it gets implemented.
Also forgive me if this things aren’t described with 100% technically accuracy. Take it with a grain of salt. Our way of doing things might not be best practices. Always check the reference guides of your network partner to learn about best practices, do not trust random guys on the internet. Always get your information from the primary source whenever possible, it will save you from a lot of hassle.
I’m at home at the upper layers of the network stack, so this obviously is the VoIP/Collab-guys perspective of things. I only occasionally have to deal with the layers below my VoIP/Video applications, so am not too familiar with all the nitty-gritty details of QoS. I usually only show up at the network engineers office when the media quality isn’t acceptable. We got VoIP working pretty nicely by now but now WebRCT is giving me interesting challenges to solve to get the quality I want for those videocalls launched from a web browser.
Jitsi was planned as a quick and dirty solution for a problem that appeared during the Covid pandemic but it seems like Jitsi is here to stay. The users like it and they want to use it.
(The problem that needed to be solved during the pandemic: Clients being scared to show up at the hospital, so doctors needed a simple way to communicate with them while they stayed at home. Cloud based solutions like Zoom were completely out of question due to legal/regulatory concerns. In healthcare we’re dealing with highly sensitive personal data so it was clear that it needs to be an on-prem solution and Jitsi was the best at hand at the time).
Looking forward. Interesting times ahead.
Cheers, have a great weekend everybody!
For people interested in the history and traditional QoS designs for voice/video up until now, have a look at this paper. On page 44 you can see that not even Cisco trusted corporate PCs but I think with WebRTC we need to rethink how we want to do QoS in the future.
QoS Strategies and Smart Media Techniques for Collaboration Deployments