I am leaving for a few days (until Monday), and before I leave, I would
like to share a few thoughts on another
feature that could possibly be useful for SIP Communicator. Or useless.
Depends on users and their connection...
In the current state of things, once the SDP session is established, SC
produces outgoing audio packets in the following
way: a frame is produced, wrapped into RTP packet, this RTP packet is
possibly encrypted with a SRTP suite, then sent.
Nevertheless, though simple, this is not the only way of sending audio
over RTP. Many codecs, including my favorite
speex and less-favorite AMR, do have specific provisions on how to pack
N frames into a single RTP packet. Such
resulting RTP packets carry therefore more than 20 ms of speech (or
whatever is the base frame size of the particular
codec); rather, it is N*20.
Such approach has its positives and negatives. Let us start with the
negatives first. Mainly, the code is more complex
and another delay of audio is introduced, of size (N-1)*20 ms.
But the positives may well outweigh the negatives in some circumstances,
especially over non-wideband networks. First,
several percent of bandwidth are saved, because in the
multi-frame-per-packet approach, only one RTP header and possibly
one SRTP authentication code is needed per N frames.
Let us count with some real numbers... In case of 8-kbps codec bitrate
and framesize 20 ms, every RTP packet contains
20 bytes of payload, 12 bytes of RTP header, and possibly 4 more bytes
of SRTP authentication code, which means that
headers and footers account for 40% of the total traffic volume. Let us
consider the encrypted case, with SRTP auth code.
If 2 audio frames are packed into single SRTP packet instead of one, you
will save 16*25 = 400 bytes per second on the unneeded
(S)RTP headers and footers, which enables you to raise the codec bitrate
from 8 kbps to 11 kbps with zero traffic volume increase.
And 11 kbps is usually noticeably higher voice quality than 8 kbps. This
for cost of 20 ms extra delay - well, count me a buyer
for such tradeoff. Or, for people who are on strict FUP regimes (for
example, Czech mobile operators usually provide 150 MB
per month as FUP), if you keep the old 8 kbps bitrate, 30% of the
traffic is saved, which may be quite significant for their FUP policy.
From encryption point of view - calculation of SRTP authentication code
is a HMAC operation with SHA-256 or something similar.
These operations have high overhead cost and usually are more time
consuming than the encryption itself - up to 3-4 times, if
we compare AES-256 and SHA-256 on the same RTP packet (I have measured
it repeatedly). On the other hand, SHA-256
has blocksize of 512 bits, and so calculation of SRTP auth code over 20
or 40 bytes of data costs exactly the same time (the
same number of SHA iterations is needed). So, in the 2-frames-per-packet
mode, half of the heaviest crypto work is saved,
without any negative effects on security.
For people who are connected over GPRS networks, there is an extra
benefit. Mobile networks are usually good for download, because
it is tacitly expected that people will mostly use them for mobile
WWW-surfing. They are bad for upload; the bandwidth and behavior
is unreliable. In our corporate experience, 2G and 2.5G networks start
behaving strangely when more than some 20-30 packets
per second are sent. In such case, the remote party experiences strange
pauses (for example, 400 ms with no network activity),
followed by sudden surges of delayed packets, which arrive in very tight
intervals (say, 20 packets 1-2 ms apart from each other).
This is quite hard to counter with Jitter Buffering, and usually a high
JB delay must be used to smooth such occurrences out.
We've had the best experience using 2 to 4 frames per packet. SC can
already receive multiple-frames-per-packet, at least for Speex
(Speex decoder is designed to handle them well). Have you thought of
sending multiple-frames-per-packet as well?