I've recently run into a discrepancy between how I understand RFC 3711 and how at least two implementations (jitsi being one of them, libsrtp being the other) perform SRTCP key derivation.
Specifically, section 4.3.1 of RFC 3711 explains how SRTP key derivation works. The key is derived by running "x" through the PRF, where "x" is the right-aligned "key_id" xor'd with the 112-bit salt. The "key_id" in turn is the concatenation of the 8-bit label and "r". "r" in turn is defined as the 48-bit packet index divided by the key derivation rate, which is zero, thus "r" ends up being a 48-bit long zero. Therefore, in an array of 14 octets for the padded and aligned "key_id", this places the label in the 8th octet. All implementations do it this way as well.
The next section, 4.3.2, explains how SRTCP key derivation works. It states that it's the same as for SRTP, except for different labels and that the SRTP packet index is replaced by the 32-bit SRTCP packet index. Now the SRTP packet index was 48 bits long, and now we only have 32 bits worth of packet index. Since "key_id" is the concatenation of the label with "r", and "r" is the same length as the packet index, in my understanding this would put the label in a different spot in "key_id" -- not in the 8th octet, but rather in the 10th, because "r" is only 32 bits long. The result is a different set of SRTCP session keys.
However, implementations disagree, and I'm not sure why. Am I missing something somewhere that would explain this discrepancy? Or am I simply misunderstanding the RFC?