[sip-comm-dev] Race condition in evCommitSent()


#1

Now we actually understand what happens. This is one helluva race condition in the ZRTP4J lib.
You couldn't have seen it anywhere on faster connections.

I have attached the offending source to this message.

The initial condition is as follows:

Alice, on SIP Communicator, sends a Commit.
Later, another Commit arrives from Bob.

Now, the method evCommitSent() is invoked and the execution goes to the part denoted by "We have a Commit clash".
Note the line denoted by <<<<!!!! 1. On that line, the variable sentPacket is set to NULL.
Now, if Alice won the Commit tie and became an Initiator, the execution goes to the line denoted by <<<<<!!!! 2.
On this line, the Commit resend timer is restarted (the comment says "restart the Commit timer, gives peer more time to react").

BUT

the sentPacket variable is already NULL, so when the timer goes off on the next occassion, it will not know what to send, and it will invoke
sendFailed() method. BOOM goes the entire handshake process.

Why does not this happen more often? Because on faster networks than 2G, and with faster peers, DHPart1 arrives quite soon, most often before
the restarted timer goes off. Such early arrival of DHPart1 cancels the timer definitely.

On slow mobile network, with a sluggish peer, DHPart1 will arrive quite late, surely more than half a second after the preceding Commit.
In the meantime, the Commit resend timer has expired, found that it has nothing to send, and killed the ZRTP handshake.

This pattern repeats itself multiple times in the class. Therefore, it will need some repairs.

Best regards

Marian Kechlibar

protected void evCommitSent() {
        char first, last;
        byte[] pkt;
        ZrtpCodes.ZrtpErrorCodes[] errorCode = new ZrtpCodes.ZrtpErrorCodes[1];

        /*
         * First check the general event type, then discrimnate the real event.
         */
        switch (event.type) {
        
        case ZrtpPacket:
            pkt = event.packet;

            first = (char) pkt[MESSAGE_OFFSET];
            first = Character.toLowerCase(first);
            last = (char) pkt[MESSAGE_OFFSET + 7];
            last = Character.toLowerCase(last);

            /*
             * HelloAck or Hello:
             * - delayed "HelloAck" or "Hello", maybe due to network latency, just
             * ignore it
             * - no switch in state, leave timer as it is
             */
            if (first == 'h' && (last =='k' || last == ' ')) {
                return;
            }

            /*
             * Commit:
             * We have a "Commit" clash. Resolve it.

···

*
             * - switch off resending Commit
             * - compare my hvi with peer's hvi
             * - if my hvi is greater
             * - we are Initiator, stay in state, wait for peer's DHPart1 packet
             * - else
             * - we are Responder, stop timer
             * - prepare and send DH1Packt,
             * - switch to state WaitDHPart2, implies Responder path
             */
            if (first == 'c' && last == ' ') {
                ZrtpPacketCommit zpCo = new ZrtpPacketCommit(pkt);

                if (!parent.verifyH2(zpCo)) {
                    return;
                }
                sentPacket = null; <<<<!!!! 1
                cancelTimer(); // this cancels the Commit timer T2

                // if our hvi is less than peer's hvi: switch to Responder mode and
                // send DHPart1 packet. Peer (as Initiator) will retrigger if
                // necessary
                //
                if (parent.compareCommit(zpCo) < 0) {
                    if (!multiStream) {
                        ZrtpPacketDHPart dhPart1 = parent.prepareDHPart1(zpCo,
                                errorCode);

                        // Something went wrong during processing of the Commit
                        // packet
                        if (dhPart1 == null) {
                            if (errorCode[0] != ZrtpCodes.ZrtpErrorCodes.IgnorePacket) {
                                sendErrorPacket(errorCode[0]);
                            }
                            return;
                        }
                        inState = ZrtpStates.WaitDHPart2;
                        sentPacket = dhPart1;
                    } else {
                        ZrtpPacketConfirm confirm = parent.prepareConfirm1MultiStream(zpCo, errorCode);

                        // Something went wrong during processing of the Commit packet
                        if (confirm == null) {
                            if (errorCode[0] != ZrtpCodes.ZrtpErrorCodes.IgnorePacket) {
                                sendErrorPacket(errorCode[0]);
                            }
                            return;
                        }
                        sentPacket = confirm;
                        inState = ZrtpStates.WaitConfirm2;

                    }
                    if (!parent.sendPacketZRTP(sentPacket)) {
                        sendFailed(); // returns to state Initial
                    }
                }
                // Stay in state, we are Initiator, wait for DHPart1 packet from peer.
                // Resend Commit after timeout until we get a DHPart1
                else {
                       <<<<<!!!! 2
                    if (startTimer(t2) <= 0) { // restart the Commit timer, gives peer more time to react
                        timerFailed(ZrtpCodes.SevereCodes.SevereNoTimer);
                    }
                }
                return;
            }

            /*
             * DHPart1:
             * - switch off resending Commit
             * - Prepare and send DHPart2
             * - switch to WaitConfirm1
             * - start timer to resend DHPart2 if necessary, we are Initiator
             */
            if (first == 'd') {
                
                ZrtpPacketDHPart dpkt = new ZrtpPacketDHPart(pkt);
                ZrtpPacketDHPart dhPart2 = parent.prepareDHPart2(dpkt, errorCode);

                // Something went wrong during processing of the DHPart1 packet
                if (dhPart2 == null) {
                    if (errorCode[0] != ZrtpCodes.ZrtpErrorCodes.IgnorePacket) {
                        sendErrorPacket(errorCode[0]);
                    }
                    return;
                }
                cancelTimer();
                sentPacket = dhPart2;
                inState = ZrtpStates.WaitConfirm1;

                if (!parent.sendPacketZRTP(sentPacket)) {
                    sendFailed(); // returns to state Initial
                    return;
                }
                if (startTimer(t2) <= 0) {
                    timerFailed(ZrtpCodes.SevereCodes.SevereNoTimer); // returns to state Initial
                }
            }
            
            if (multiStream && (first == 'c' && last == '1')) {
                
                cancelTimer();
                ZrtpPacketConfirm cpkt = new ZrtpPacketConfirm(pkt);
                sentPacket = null;

                ZrtpPacketConfirm confirm = parent.prepareConfirm2MultiStream(cpkt, errorCode);

                // Something went wrong during processing of the Confirm1 packet
                if (confirm == null) {
                    sendErrorPacket(errorCode[0]);
                    return;
                }
                inState = ZrtpStates.WaitConfAck;

                sentPacket = confirm;
                if (!parent.sendPacketZRTP(sentPacket)) {
                    sendFailed(); // returns to state Initial
                    return;
                }
                if (startTimer(t2) <= 0) {
                    timerFailed(ZrtpCodes.SevereCodes.SevereNoTimer); // returns to state Initial
                }
                if (!parent.srtpSecretsReady(ZrtpCallback.EnableSecurity.ForReceiver)) {
                    parent.sendInfo(ZrtpCodes.MessageSeverity.Severe, EnumSet
                            .of(ZrtpCodes.SevereCodes.SevereSecurityException));
                    sendErrorPacket(ZrtpCodes.ZrtpErrorCodes.CriticalSWError);
                    return;
                }
            }
            break;
            
        // Timer event triggered, resend the Commit packet
        case Timer:
            if (!parent.sendPacketZRTP(sentPacket)) {
                    sendFailed(); // returns to state Initial
                    return;
            }
            if (nextTimer(t2) <= 0) {
                timerFailed(ZrtpCodes.SevereCodes.SevereTooMuchRetries); // returns to state Initial
            }
            break;

        default: // unknown Event type for this state (covers Error and ZrtpClose)
            if (event.type != EventDataType.ZrtpClose) {
                parent.zrtpNegotiationFailed(ZrtpCodes.MessageSeverity.Severe,
                        EnumSet.of(ZrtpCodes.SevereCodes.SevereProtocolError));
            }
            sentPacket = null;
            inState = ZrtpStates.Initial;
        }
    }

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#2

Hi Marian,

thanks alot for this detailled analysis. I'll dig into the problem an fix
it. As you said, it appears on several parts thus I need to think a little
bit about the timing an race conditions here. Together with testing I think
I can provide a ptached version during Thursday.

Best regards,
Werner

···

Am 26.10.2010 15:47, schrieb Marian Kechlibar:

Now we actually understand what happens. This is one helluva race condition in the ZRTP4J lib.
You couldn't have seen it anywhere on faster connections.

I have attached the offending source to this message.

The initial condition is as follows:

Alice, on SIP Communicator, sends a Commit.
Later, another Commit arrives from Bob.

Now, the method evCommitSent() is invoked and the execution goes to the part denoted by "We have a Commit clash".
Note the line denoted by <<<<!!!! 1. On that line, the variable sentPacket is set to NULL.
Now, if Alice won the Commit tie and became an Initiator, the execution goes to the line denoted by <<<<<!!!! 2.
On this line, the Commit resend timer is restarted (the comment says "restart the Commit timer, gives peer more time to react").

BUT

the sentPacket variable is already NULL, so when the timer goes off on the next occassion, it will not know what to send, and it will invoke
sendFailed() method. BOOM goes the entire handshake process.

Why does not this happen more often? Because on faster networks than 2G, and with faster peers, DHPart1 arrives quite soon, most often before
the restarted timer goes off. Such early arrival of DHPart1 cancels the timer definitely.

On slow mobile network, with a sluggish peer, DHPart1 will arrive quite late, surely more than half a second after the preceding Commit.
In the meantime, the Commit resend timer has expired, found that it has nothing to send, and killed the ZRTP handshake.

This pattern repeats itself multiple times in the class. Therefore, it will need some repairs.

Best regards

Marian Kechlibar

protected void evCommitSent() {
        char first, last;
        byte[] pkt;
        ZrtpCodes.ZrtpErrorCodes[] errorCode = new ZrtpCodes.ZrtpErrorCodes[1];

        /*
         * First check the general event type, then discrimnate the real event.
         */
        switch (event.type) {
        
        case ZrtpPacket:
            pkt = event.packet;

            first = (char) pkt[MESSAGE_OFFSET];
            first = Character.toLowerCase(first);
            last = (char) pkt[MESSAGE_OFFSET + 7];
            last = Character.toLowerCase(last);

            /*
             * HelloAck or Hello:
             * - delayed "HelloAck" or "Hello", maybe due to network latency, just
             * ignore it
             * - no switch in state, leave timer as it is
             */
            if (first == 'h' && (last =='k' || last == ' ')) {
                return;
            }

            /*
             * Commit:
             * We have a "Commit" clash. Resolve it.
             *
             * - switch off resending Commit
             * - compare my hvi with peer's hvi
             * - if my hvi is greater
             * - we are Initiator, stay in state, wait for peer's DHPart1 packet
             * - else
             * - we are Responder, stop timer
             * - prepare and send DH1Packt,
             * - switch to state WaitDHPart2, implies Responder path
             */
            if (first == 'c' && last == ' ') {
                ZrtpPacketCommit zpCo = new ZrtpPacketCommit(pkt);

                if (!parent.verifyH2(zpCo)) {
                    return;
                }
                sentPacket = null; <<<<!!!! 1
                cancelTimer(); // this cancels the Commit timer T2

                // if our hvi is less than peer's hvi: switch to Responder mode and
                // send DHPart1 packet. Peer (as Initiator) will retrigger if
                // necessary
                //
                if (parent.compareCommit(zpCo) < 0) {
                    if (!multiStream) {
                        ZrtpPacketDHPart dhPart1 = parent.prepareDHPart1(zpCo,
                                errorCode);

                        // Something went wrong during processing of the Commit
                        // packet
                        if (dhPart1 == null) {
                            if (errorCode[0] != ZrtpCodes.ZrtpErrorCodes.IgnorePacket) {
                                sendErrorPacket(errorCode[0]);
                            }
                            return;
                        }
                        inState = ZrtpStates.WaitDHPart2;
                        sentPacket = dhPart1;
                    } else {
                        ZrtpPacketConfirm confirm = parent.prepareConfirm1MultiStream(zpCo, errorCode);

                        // Something went wrong during processing of the Commit packet
                        if (confirm == null) {
                            if (errorCode[0] != ZrtpCodes.ZrtpErrorCodes.IgnorePacket) {
                                sendErrorPacket(errorCode[0]);
                            }
                            return;
                        }
                        sentPacket = confirm;
                        inState = ZrtpStates.WaitConfirm2;

                    }
                    if (!parent.sendPacketZRTP(sentPacket)) {
                        sendFailed(); // returns to state Initial
                    }
                }
                // Stay in state, we are Initiator, wait for DHPart1 packet from peer.
                // Resend Commit after timeout until we get a DHPart1
                else {
                       <<<<<!!!! 2
                    if (startTimer(t2) <= 0) { // restart the Commit timer, gives peer more time to react
                        timerFailed(ZrtpCodes.SevereCodes.SevereNoTimer);
                    }
                }
                return;
            }

            /*
             * DHPart1:
             * - switch off resending Commit
             * - Prepare and send DHPart2
             * - switch to WaitConfirm1
             * - start timer to resend DHPart2 if necessary, we are Initiator
             */
            if (first == 'd') {
                
                ZrtpPacketDHPart dpkt = new ZrtpPacketDHPart(pkt);
                ZrtpPacketDHPart dhPart2 = parent.prepareDHPart2(dpkt, errorCode);

                // Something went wrong during processing of the DHPart1 packet
                if (dhPart2 == null) {
                    if (errorCode[0] != ZrtpCodes.ZrtpErrorCodes.IgnorePacket) {
                        sendErrorPacket(errorCode[0]);
                    }
                    return;
                }
                cancelTimer();
                sentPacket = dhPart2;
                inState = ZrtpStates.WaitConfirm1;

                if (!parent.sendPacketZRTP(sentPacket)) {
                    sendFailed(); // returns to state Initial
                    return;
                }
                if (startTimer(t2) <= 0) {
                    timerFailed(ZrtpCodes.SevereCodes.SevereNoTimer); // returns to state Initial
                }
            }
            
            if (multiStream && (first == 'c' && last == '1')) {
                
                cancelTimer();
                ZrtpPacketConfirm cpkt = new ZrtpPacketConfirm(pkt);
                sentPacket = null;

                ZrtpPacketConfirm confirm = parent.prepareConfirm2MultiStream(cpkt, errorCode);

                // Something went wrong during processing of the Confirm1 packet
                if (confirm == null) {
                    sendErrorPacket(errorCode[0]);
                    return;
                }
                inState = ZrtpStates.WaitConfAck;

                sentPacket = confirm;
                if (!parent.sendPacketZRTP(sentPacket)) {
                    sendFailed(); // returns to state Initial
                    return;
                }
                if (startTimer(t2) <= 0) {
                    timerFailed(ZrtpCodes.SevereCodes.SevereNoTimer); // returns to state Initial
                }
                if (!parent.srtpSecretsReady(ZrtpCallback.EnableSecurity.ForReceiver)) {
                    parent.sendInfo(ZrtpCodes.MessageSeverity.Severe, EnumSet
                            .of(ZrtpCodes.SevereCodes.SevereSecurityException));
                    sendErrorPacket(ZrtpCodes.ZrtpErrorCodes.CriticalSWError);
                    return;
                }
            }
            break;
            
        // Timer event triggered, resend the Commit packet
        case Timer:
            if (!parent.sendPacketZRTP(sentPacket)) {
                    sendFailed(); // returns to state Initial
                    return;
            }
            if (nextTimer(t2) <= 0) {
                timerFailed(ZrtpCodes.SevereCodes.SevereTooMuchRetries); // returns to state Initial
            }
            break;

        default: // unknown Event type for this state (covers Error and ZrtpClose)
            if (event.type != EventDataType.ZrtpClose) {
                parent.zrtpNegotiationFailed(ZrtpCodes.MessageSeverity.Severe,
                        EnumSet.of(ZrtpCodes.SevereCodes.SevereProtocolError));
            }
            sentPacket = null;
            inState = ZrtpStates.Initial;
        }
    }

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#3

Hi Marian,

just a quick fix to enable further testing on your side:

you may just delete the offending line marked wit <<<<!!!! 1.

The following code in the if - part sets the sentPacket variable
to a correct value, thus it's save to delete the line.

In case a timeout happens (timer restarted at <<<<!!!! 2) then
the Timer handling will resend the Commit packet if necessary.

Nevertheless I'll check other parts of the code as well :slight_smile: .

Best regards,
Werner

···

Am 26.10.2010 15:47, schrieb Marian Kechlibar:

Now we actually understand what happens. This is one helluva race condition in the ZRTP4J lib.
You couldn't have seen it anywhere on faster connections.

I have attached the offending source to this message.

The initial condition is as follows:

Alice, on SIP Communicator, sends a Commit.
Later, another Commit arrives from Bob.

Now, the method evCommitSent() is invoked and the execution goes to the part denoted by "We have a Commit clash".
Note the line denoted by <<<<!!!! 1. On that line, the variable sentPacket is set to NULL.
Now, if Alice won the Commit tie and became an Initiator, the execution goes to the line denoted by <<<<<!!!! 2.
On this line, the Commit resend timer is restarted (the comment says "restart the Commit timer, gives peer more time to react").

BUT

the sentPacket variable is already NULL, so when the timer goes off on the next occassion, it will not know what to send, and it will invoke
sendFailed() method. BOOM goes the entire handshake process.

Why does not this happen more often? Because on faster networks than 2G, and with faster peers, DHPart1 arrives quite soon, most often before
the restarted timer goes off. Such early arrival of DHPart1 cancels the timer definitely.

On slow mobile network, with a sluggish peer, DHPart1 will arrive quite late, surely more than half a second after the preceding Commit.
In the meantime, the Commit resend timer has expired, found that it has nothing to send, and killed the ZRTP handshake.

This pattern repeats itself multiple times in the class. Therefore, it will need some repairs.

Best regards

Marian Kechlibar

protected void evCommitSent() {
        char first, last;
        byte[] pkt;
        ZrtpCodes.ZrtpErrorCodes[] errorCode = new ZrtpCodes.ZrtpErrorCodes[1];

        /*
         * First check the general event type, then discrimnate the real event.
         */
        switch (event.type) {
        
        case ZrtpPacket:
            pkt = event.packet;

            first = (char) pkt[MESSAGE_OFFSET];
            first = Character.toLowerCase(first);
            last = (char) pkt[MESSAGE_OFFSET + 7];
            last = Character.toLowerCase(last);

            /*
             * HelloAck or Hello:
             * - delayed "HelloAck" or "Hello", maybe due to network latency, just
             * ignore it
             * - no switch in state, leave timer as it is
             */
            if (first == 'h' && (last =='k' || last == ' ')) {
                return;
            }

            /*
             * Commit:
             * We have a "Commit" clash. Resolve it.
             *
             * - switch off resending Commit
             * - compare my hvi with peer's hvi
             * - if my hvi is greater
             * - we are Initiator, stay in state, wait for peer's DHPart1 packet
             * - else
             * - we are Responder, stop timer
             * - prepare and send DH1Packt,
             * - switch to state WaitDHPart2, implies Responder path
             */
            if (first == 'c' && last == ' ') {
                ZrtpPacketCommit zpCo = new ZrtpPacketCommit(pkt);

                if (!parent.verifyH2(zpCo)) {
                    return;
                }
                sentPacket = null; <<<<!!!! 1
                cancelTimer(); // this cancels the Commit timer T2

                // if our hvi is less than peer's hvi: switch to Responder mode and
                // send DHPart1 packet. Peer (as Initiator) will retrigger if
                // necessary
                //
                if (parent.compareCommit(zpCo) < 0) {
                    if (!multiStream) {
                        ZrtpPacketDHPart dhPart1 = parent.prepareDHPart1(zpCo,
                                errorCode);

                        // Something went wrong during processing of the Commit
                        // packet
                        if (dhPart1 == null) {
                            if (errorCode[0] != ZrtpCodes.ZrtpErrorCodes.IgnorePacket) {
                                sendErrorPacket(errorCode[0]);
                            }
                            return;
                        }
                        inState = ZrtpStates.WaitDHPart2;
                        sentPacket = dhPart1;
                    } else {
                        ZrtpPacketConfirm confirm = parent.prepareConfirm1MultiStream(zpCo, errorCode);

                        // Something went wrong during processing of the Commit packet
                        if (confirm == null) {
                            if (errorCode[0] != ZrtpCodes.ZrtpErrorCodes.IgnorePacket) {
                                sendErrorPacket(errorCode[0]);
                            }
                            return;
                        }
                        sentPacket = confirm;
                        inState = ZrtpStates.WaitConfirm2;

                    }
                    if (!parent.sendPacketZRTP(sentPacket)) {
                        sendFailed(); // returns to state Initial
                    }
                }
                // Stay in state, we are Initiator, wait for DHPart1 packet from peer.
                // Resend Commit after timeout until we get a DHPart1
                else {
                       <<<<<!!!! 2
                    if (startTimer(t2) <= 0) { // restart the Commit timer, gives peer more time to react
                        timerFailed(ZrtpCodes.SevereCodes.SevereNoTimer);
                    }
                }
                return;
            }

            /*
             * DHPart1:
             * - switch off resending Commit
             * - Prepare and send DHPart2
             * - switch to WaitConfirm1
             * - start timer to resend DHPart2 if necessary, we are Initiator
             */
            if (first == 'd') {
                
                ZrtpPacketDHPart dpkt = new ZrtpPacketDHPart(pkt);
                ZrtpPacketDHPart dhPart2 = parent.prepareDHPart2(dpkt, errorCode);

                // Something went wrong during processing of the DHPart1 packet
                if (dhPart2 == null) {
                    if (errorCode[0] != ZrtpCodes.ZrtpErrorCodes.IgnorePacket) {
                        sendErrorPacket(errorCode[0]);
                    }
                    return;
                }
                cancelTimer();
                sentPacket = dhPart2;
                inState = ZrtpStates.WaitConfirm1;

                if (!parent.sendPacketZRTP(sentPacket)) {
                    sendFailed(); // returns to state Initial
                    return;
                }
                if (startTimer(t2) <= 0) {
                    timerFailed(ZrtpCodes.SevereCodes.SevereNoTimer); // returns to state Initial
                }
            }
            
            if (multiStream && (first == 'c' && last == '1')) {
                
                cancelTimer();
                ZrtpPacketConfirm cpkt = new ZrtpPacketConfirm(pkt);
                sentPacket = null;

                ZrtpPacketConfirm confirm = parent.prepareConfirm2MultiStream(cpkt, errorCode);

                // Something went wrong during processing of the Confirm1 packet
                if (confirm == null) {
                    sendErrorPacket(errorCode[0]);
                    return;
                }
                inState = ZrtpStates.WaitConfAck;

                sentPacket = confirm;
                if (!parent.sendPacketZRTP(sentPacket)) {
                    sendFailed(); // returns to state Initial
                    return;
                }
                if (startTimer(t2) <= 0) {
                    timerFailed(ZrtpCodes.SevereCodes.SevereNoTimer); // returns to state Initial
                }
                if (!parent.srtpSecretsReady(ZrtpCallback.EnableSecurity.ForReceiver)) {
                    parent.sendInfo(ZrtpCodes.MessageSeverity.Severe, EnumSet
                            .of(ZrtpCodes.SevereCodes.SevereSecurityException));
                    sendErrorPacket(ZrtpCodes.ZrtpErrorCodes.CriticalSWError);
                    return;
                }
            }
            break;
            
        // Timer event triggered, resend the Commit packet
        case Timer:
            if (!parent.sendPacketZRTP(sentPacket)) {
                    sendFailed(); // returns to state Initial
                    return;
            }
            if (nextTimer(t2) <= 0) {
                timerFailed(ZrtpCodes.SevereCodes.SevereTooMuchRetries); // returns to state Initial
            }
            break;

        default: // unknown Event type for this state (covers Error and ZrtpClose)
            if (event.type != EventDataType.ZrtpClose) {
                parent.zrtpNegotiationFailed(ZrtpCodes.MessageSeverity.Severe,
                        EnumSet.of(ZrtpCodes.SevereCodes.SevereProtocolError));
            }
            sentPacket = null;
            inState = ZrtpStates.Initial;
        }
    }

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#4

Dear Werner et al.,

we were again able to detect a problem in ZRTP4J, this time probably
quite a serious one.

Let us say that Alice is SIP Communicator and Bob is our client. Let us
presume that Alice and Bob have had a single successful
call before, and therefore a single retained secret RS1 is present in
their respective ZRTP caches.

Now Alice and Bob are calling each other again. Both Alice and Bob find
out that they have a common RS1. This RS1 is, for example,
equal to
de4ad027fea9fb0a41d2bdd32daac61a67df0f224c3fac3a7a66b728531fd10a (a real
experimental value).

In the method ZRtp::PrepareDHPart1(), Alice loads ZidRecord-s from
ZidFile, and chooses the one which matches with the ZID of the
peer. Alice then enters the method ZRtp::computeSharedSecretSet, where
she computes rs1IDi, rs1IDr, rs2IDi, rs2IDr from the loaded
shared secret record (ZidRecord).

Nevertheless, under special circumstances, the secret cache seems to
contain two different value sets for the same ZID. We were able to create
this situation if our remote client crashed during the Confirm phase. It
seemed as if the ZRTP cache got corrupted in that situation, containing
both the "old" and the "new" RS1 value under the same ZID; the "new"
value actually unconfirmed!

We were actually able to locate both values within the binary ZID file
with a Hex editor.

In any subsequent call, ZRTP would use the "new" RS1 for computation of
outgoing rs1IDi, rs1IDr, but the "old" RS1 for computation
of the incoming rs1IDi, rs1IDr. This had disastrous result for S0
computation; Alice would therefore use the "old" RS1 to compute the S0,
while Bob would find out that incoming IDs of the "new" RS1 do not match
and wouldn't use RS1 to compute S0; so, Confirm phase of
the ZRTP handshake would fail.

Unfortunately, when we deleted the failing ZID file, we were no longer
able to reproduce the error (in the meantime, I corrected Bob's client
so that it wouldn't crash anymore).

But this is definitely something to meditate upon:

- what is the mechanism within ZRTP4J to prevent double entries for the
same ZIDs?
- when exactly does the ZRTP4J save the new value of RS1? Our client
updates the cache on arrival of the first Confirm2 (when local party is
Responder)
or Conf2ACK (when local party is initiator) message, or when the first
SRTP packet is decrypted successfully (any ZRTP party), whatever happens
sooner.

Best regards

Marian Kechlibar

···

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#5

Hi Werner,

yes, we already did something similar and thus proceeded further. It is
possible that similar race condition
happens in handling of other initiator messages, but we did not provoke
another instance of it.

We have also tested how does the thing sound with RTP packet traffic
turned off during ZRTP handshake
phase, and it sounds OK to the clients. The patch is not very nice, but
working. Its logic is as follows:

- there is a new static boolean variable in
RTPConnectorOutputStream.java, its name is zrtpRunning. This
boolean is protected against concurrent modification, and if true, it
means that only ZRTP packets are allowed
to be sent (everything else is discarded),
- this variable is set to true in ZRTPTransformEngine::sendDataZRTP()
and to false in all callback methods.

This does not exactly correspond to the requirements in ZRTP Draft, but
was working for us. We will nevertheless
improve it before proposing the patch.

Best regards

Marian

Dne 26.10.2010 18:27, Werner Dittmann napsal(a):

···

Hi Marian,

just a quick fix to enable further testing on your side:

you may just delete the offending line marked wit <<<<!!!! 1.

The following code in the if - part sets the sentPacket variable
to a correct value, thus it's save to delete the line.

In case a timeout happens (timer restarted at <<<<!!!! 2) then
the Timer handling will resend the Commit packet if necessary.

Nevertheless I'll check other parts of the code as well :slight_smile: .

Best regards,
Werner

Am 26.10.2010 15:47, schrieb Marian Kechlibar:

Now we actually understand what happens. This is one helluva race condition in the ZRTP4J lib.
You couldn't have seen it anywhere on faster connections.

I have attached the offending source to this message.

The initial condition is as follows:

Alice, on SIP Communicator, sends a Commit.
Later, another Commit arrives from Bob.

Now, the method evCommitSent() is invoked and the execution goes to the part denoted by "We have a Commit clash".
Note the line denoted by <<<<!!!! 1. On that line, the variable sentPacket is set to NULL.
Now, if Alice won the Commit tie and became an Initiator, the execution goes to the line denoted by <<<<<!!!! 2.
On this line, the Commit resend timer is restarted (the comment says "restart the Commit timer, gives peer more time to react").

BUT

the sentPacket variable is already NULL, so when the timer goes off on the next occassion, it will not know what to send, and it will invoke
sendFailed() method. BOOM goes the entire handshake process.

Why does not this happen more often? Because on faster networks than 2G, and with faster peers, DHPart1 arrives quite soon, most often before
the restarted timer goes off. Such early arrival of DHPart1 cancels the timer definitely.

On slow mobile network, with a sluggish peer, DHPart1 will arrive quite late, surely more than half a second after the preceding Commit.
In the meantime, the Commit resend timer has expired, found that it has nothing to send, and killed the ZRTP handshake.

This pattern repeats itself multiple times in the class. Therefore, it will need some repairs.

Best regards

Marian Kechlibar

protected void evCommitSent() {
        char first, last;
        byte[] pkt;
        ZrtpCodes.ZrtpErrorCodes[] errorCode = new ZrtpCodes.ZrtpErrorCodes[1];

        /*
         * First check the general event type, then discrimnate the real event.
         */
        switch (event.type) {
        
        case ZrtpPacket:
            pkt = event.packet;

            first = (char) pkt[MESSAGE_OFFSET];
            first = Character.toLowerCase(first);
            last = (char) pkt[MESSAGE_OFFSET + 7];
            last = Character.toLowerCase(last);

            /*
             * HelloAck or Hello:
             * - delayed "HelloAck" or "Hello", maybe due to network latency, just
             * ignore it
             * - no switch in state, leave timer as it is
             */
            if (first == 'h' && (last =='k' || last == ' ')) {
                return;
            }

            /*
             * Commit:
             * We have a "Commit" clash. Resolve it.
             *
             * - switch off resending Commit
             * - compare my hvi with peer's hvi
             * - if my hvi is greater
             * - we are Initiator, stay in state, wait for peer's DHPart1 packet
             * - else
             * - we are Responder, stop timer
             * - prepare and send DH1Packt,
             * - switch to state WaitDHPart2, implies Responder path
             */
            if (first == 'c' && last == ' ') {
                ZrtpPacketCommit zpCo = new ZrtpPacketCommit(pkt);

                if (!parent.verifyH2(zpCo)) {
                    return;
                }
                sentPacket = null; <<<<!!!! 1
                cancelTimer(); // this cancels the Commit timer T2

                // if our hvi is less than peer's hvi: switch to Responder mode and
                // send DHPart1 packet. Peer (as Initiator) will retrigger if
                // necessary
                //
                if (parent.compareCommit(zpCo) < 0) {
                    if (!multiStream) {
                        ZrtpPacketDHPart dhPart1 = parent.prepareDHPart1(zpCo,
                                errorCode);

                        // Something went wrong during processing of the Commit
                        // packet
                        if (dhPart1 == null) {
                            if (errorCode[0] != ZrtpCodes.ZrtpErrorCodes.IgnorePacket) {
                                sendErrorPacket(errorCode[0]);
                            }
                            return;
                        }
                        inState = ZrtpStates.WaitDHPart2;
                        sentPacket = dhPart1;
                    } else {
                        ZrtpPacketConfirm confirm = parent.prepareConfirm1MultiStream(zpCo, errorCode);

                        // Something went wrong during processing of the Commit packet
                        if (confirm == null) {
                            if (errorCode[0] != ZrtpCodes.ZrtpErrorCodes.IgnorePacket) {
                                sendErrorPacket(errorCode[0]);
                            }
                            return;
                        }
                        sentPacket = confirm;
                        inState = ZrtpStates.WaitConfirm2;

                    }
                    if (!parent.sendPacketZRTP(sentPacket)) {
                        sendFailed(); // returns to state Initial
                    }
                }
                // Stay in state, we are Initiator, wait for DHPart1 packet from peer.
                // Resend Commit after timeout until we get a DHPart1
                else {
                       <<<<<!!!! 2
                    if (startTimer(t2) <= 0) { // restart the Commit timer, gives peer more time to react
                        timerFailed(ZrtpCodes.SevereCodes.SevereNoTimer);
                    }
                }
                return;
            }

            /*
             * DHPart1:
             * - switch off resending Commit
             * - Prepare and send DHPart2
             * - switch to WaitConfirm1
             * - start timer to resend DHPart2 if necessary, we are Initiator
             */
            if (first == 'd') {
                
                ZrtpPacketDHPart dpkt = new ZrtpPacketDHPart(pkt);
                ZrtpPacketDHPart dhPart2 = parent.prepareDHPart2(dpkt, errorCode);

                // Something went wrong during processing of the DHPart1 packet
                if (dhPart2 == null) {
                    if (errorCode[0] != ZrtpCodes.ZrtpErrorCodes.IgnorePacket) {
                        sendErrorPacket(errorCode[0]);
                    }
                    return;
                }
                cancelTimer();
                sentPacket = dhPart2;
                inState = ZrtpStates.WaitConfirm1;

                if (!parent.sendPacketZRTP(sentPacket)) {
                    sendFailed(); // returns to state Initial
                    return;
                }
                if (startTimer(t2) <= 0) {
                    timerFailed(ZrtpCodes.SevereCodes.SevereNoTimer); // returns to state Initial
                }
            }
            
            if (multiStream && (first == 'c' && last == '1')) {
                
                cancelTimer();
                ZrtpPacketConfirm cpkt = new ZrtpPacketConfirm(pkt);
                sentPacket = null;

                ZrtpPacketConfirm confirm = parent.prepareConfirm2MultiStream(cpkt, errorCode);

                // Something went wrong during processing of the Confirm1 packet
                if (confirm == null) {
                    sendErrorPacket(errorCode[0]);
                    return;
                }
                inState = ZrtpStates.WaitConfAck;

                sentPacket = confirm;
                if (!parent.sendPacketZRTP(sentPacket)) {
                    sendFailed(); // returns to state Initial
                    return;
                }
                if (startTimer(t2) <= 0) {
                    timerFailed(ZrtpCodes.SevereCodes.SevereNoTimer); // returns to state Initial
                }
                if (!parent.srtpSecretsReady(ZrtpCallback.EnableSecurity.ForReceiver)) {
                    parent.sendInfo(ZrtpCodes.MessageSeverity.Severe, EnumSet
                            .of(ZrtpCodes.SevereCodes.SevereSecurityException));
                    sendErrorPacket(ZrtpCodes.ZrtpErrorCodes.CriticalSWError);
                    return;
                }
            }
            break;
            
        // Timer event triggered, resend the Commit packet
        case Timer:
            if (!parent.sendPacketZRTP(sentPacket)) {
                    sendFailed(); // returns to state Initial
                    return;
            }
            if (nextTimer(t2) <= 0) {
                timerFailed(ZrtpCodes.SevereCodes.SevereTooMuchRetries); // returns to state Initial
            }
            break;

        default: // unknown Event type for this state (covers Error and ZrtpClose)
            if (event.type != EventDataType.ZrtpClose) {
                parent.zrtpNegotiationFailed(ZrtpCodes.MessageSeverity.Severe,
                        EnumSet.of(ZrtpCodes.SevereCodes.SevereProtocolError));
            }
            sentPacket = null;
            inState = ZrtpStates.Initial;
        }
    }

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#6

Hi Marion,

a few questions - please see inline.

Regards,
Werner

Dear Werner et al.,

we were again able to detect a problem in ZRTP4J, this time probably
quite a serious one.

Let us say that Alice is SIP Communicator and Bob is our client. Let us
presume that Alice and Bob have had a single successful
call before, and therefore a single retained secret RS1 is present in
their respective ZRTP caches.

Now Alice and Bob are calling each other again. Both Alice and Bob find
out that they have a common RS1. This RS1 is, for example,
equal to
de4ad027fea9fb0a41d2bdd32daac61a67df0f224c3fac3a7a66b728531fd10a (a real
experimental value).

In the method ZRtp::PrepareDHPart1(), Alice loads ZidRecord-s from
ZidFile, and chooses the one which matches with the ZID of the
peer. Alice then enters the method ZRtp::computeSharedSecretSet, where
she computes rs1IDi, rs1IDr, rs2IDi, rs2IDr from the loaded
shared secret record (ZidRecord).

Nevertheless, under special circumstances, the secret cache seems to
contain two different value sets for the same ZID. We were able to create
this situation if our remote client crashed during the Confirm phase.

With remote client: this is Bob's (your) client? If yes then this happens
somewhere during the Confirm and probably a following step that my be
triggered by a timer. I'll have a look.

···

Am 26.10.2010 18:32, schrieb Marian Kechlibar:

It seemed as if the ZRTP cache got corrupted in that situation, containing
both the "old" and the "new" RS1 value under the same ZID; the "new"
value actually unconfirmed!

We were actually able to locate both values within the binary ZID file
with a Hex editor.

In any subsequent call, ZRTP would use the "new" RS1 for computation of
outgoing rs1IDi, rs1IDr, but the "old" RS1 for computation
of the incoming rs1IDi, rs1IDr. This had disastrous result for S0
computation; Alice would therefore use the "old" RS1 to compute the S0,
while Bob would find out that incoming IDs of the "new" RS1 do not match
and wouldn't use RS1 to compute S0; so, Confirm phase of
the ZRTP handshake would fail.

Unfortunately, when we deleted the failing ZID file, we were no longer
able to reproduce the error (in the meantime, I corrected Bob's client
so that it wouldn't crash anymore).

But this is definitely something to meditate upon:

- what is the mechanism within ZRTP4J to prevent double entries for the
same ZIDs?
- when exactly does the ZRTP4J save the new value of RS1? Our client
updates the cache on arrival of the first Confirm2 (when local party is
Responder)
or Conf2ACK (when local party is initiator) message, or when the first
SRTP packet is decrypted successfully (any ZRTP party), whatever happens
sooner.

Best regards

Marian Kechlibar

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#7

Hi,

I have a contributed version with support for Jingle Nodes, and I would like
to know how to generate the binaries for Mac/Windows/Linux.
Also, how is the simplest way to add an icon to the call window to indicate
the usage of Jingle Nodes?

Thiago Rocha Camargo
barata7@gmail.com
http://jinglenodes.org
My profiles: [image: Twitter] <http://twitter.com/xmppjingle>
Contact me: [image: Google Talk/] barata7@gmail.com


#8

Dear Werner,

the only inline question I found was whether Bob was my client ... yes,
it was.

Best regards

Marian

Dne 26.10.2010 20:19, Werner Dittmann napsal(a):

···

Hi Marion,

a few questions - please see inline.

Regards,
Werner

Am 26.10.2010 18:32, schrieb Marian Kechlibar:

Dear Werner et al.,

we were again able to detect a problem in ZRTP4J, this time probably
quite a serious one.

Let us say that Alice is SIP Communicator and Bob is our client. Let us
presume that Alice and Bob have had a single successful
call before, and therefore a single retained secret RS1 is present in
their respective ZRTP caches.

Now Alice and Bob are calling each other again. Both Alice and Bob find
out that they have a common RS1. This RS1 is, for example,
equal to
de4ad027fea9fb0a41d2bdd32daac61a67df0f224c3fac3a7a66b728531fd10a (a real
experimental value).

In the method ZRtp::PrepareDHPart1(), Alice loads ZidRecord-s from
ZidFile, and chooses the one which matches with the ZID of the
peer. Alice then enters the method ZRtp::computeSharedSecretSet, where
she computes rs1IDi, rs1IDr, rs2IDi, rs2IDr from the loaded
shared secret record (ZidRecord).

Nevertheless, under special circumstances, the secret cache seems to
contain two different value sets for the same ZID. We were able to create
this situation if our remote client crashed during the Confirm phase.

With remote client: this is Bob's (your) client? If yes then this happens
somewhere during the Confirm and probably a following step that my be
triggered by a timer. I'll have a look.

It seemed as if the ZRTP cache got corrupted in that situation, containing
both the "old" and the "new" RS1 value under the same ZID; the "new"
value actually unconfirmed!

We were actually able to locate both values within the binary ZID file
with a Hex editor.

In any subsequent call, ZRTP would use the "new" RS1 for computation of
outgoing rs1IDi, rs1IDr, but the "old" RS1 for computation
of the incoming rs1IDi, rs1IDr. This had disastrous result for S0
computation; Alice would therefore use the "old" RS1 to compute the S0,
while Bob would find out that incoming IDs of the "new" RS1 do not match
and wouldn't use RS1 to compute S0; so, Confirm phase of
the ZRTP handshake would fail.

Unfortunately, when we deleted the failing ZID file, we were no longer
able to reproduce the error (in the meantime, I corrected Bob's client
so that it wouldn't crash anymore).

But this is definitely something to meditate upon:

- what is the mechanism within ZRTP4J to prevent double entries for the
same ZIDs?
- when exactly does the ZRTP4J save the new value of RS1? Our client
updates the cache on arrival of the first Confirm2 (when local party is
Responder)
or Conf2ACK (when local party is initiator) message, or when the first
SRTP packet is decrypted successfully (any ZRTP party), whatever happens
sooner.

Best regards

Marian Kechlibar

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net