[jitsi-dev] ICE race condition with SIP


#1

I've observed an occasional race condition running ICE between my two
user agents:

Good:

device 1: ICE completion
device 2: ICE completion
device 1: send 200 OK
device 2: receive 200 OK

Bad:

device 1: ICE completion
device 1: send 200 OK
device 2: receive 200 OK
device 2: ICE completion

Notice in the `bad' case, device 2 is receiving the 200 OK from the peer
before the local ICE agent has completed

I don't believe it is necessarily a fault with ICE or the ice4j
implementation, but it is another gotcha for implementors to be aware
of, especially when linking ice4j to existing code - I am looking into
it more closely. I believe the application code needs to operate
something like the `barrier' pattern, where it does not proceed to setup
RTP until (ICE compeletion && SIP 200) have both occurred.


#2

Hi Daniel,

Yes I think you are right. Second peer should wait its ICE completion before processing the "200 OK" (and send its ACK right after).

We do similar way for our XMPP implementation: we wait ICE finished before processing "session-accept" message.

Regards,

···

--
Seb

Le 07/02/12 12:56, Daniel Pocock a �crit :

I've observed an occasional race condition running ICE between my two
user agents:

Good:

device 1: ICE completion
device 2: ICE completion
device 1: send 200 OK
device 2: receive 200 OK

Bad:

device 1: ICE completion
device 1: send 200 OK
device 2: receive 200 OK
device 2: ICE completion

Notice in the `bad' case, device 2 is receiving the 200 OK from the peer
before the local ICE agent has completed

I don't believe it is necessarily a fault with ICE or the ice4j
implementation, but it is another gotcha for implementors to be aware
of, especially when linking ice4j to existing code - I am looking into
it more closely. I believe the application code needs to operate
something like the `barrier' pattern, where it does not proceed to setup
RTP until (ICE compeletion&& SIP 200) have both occurred.


#3

Hi Daniel,

Yes I think you are right. Second peer should wait its ICE completion
before processing the "200 OK" (and send its ACK right after).

It seems a bit odd though - because callee user should not be alerted
(ringing alert, or sending 180 back to caller) until both sides are
completely certain that media can flow. If the ICE agent on one side is
still active (it continues logging it's activities after the 200 OK came
in), then does that mean the ICE agent on the callee side has finished
too soon?

···

On 07/02/12 13:45, Sebastien Vincent wrote:

We do similar way for our XMPP implementation: we wait ICE finished
before processing "session-accept" message.

Regards,
--
Seb

Le 07/02/12 12:56, Daniel Pocock a �crit :

I've observed an occasional race condition running ICE between my two
user agents:

Good:

device 1: ICE completion
device 2: ICE completion
device 1: send 200 OK
device 2: receive 200 OK

Bad:

device 1: ICE completion
device 1: send 200 OK
device 2: receive 200 OK
device 2: ICE completion

Notice in the `bad' case, device 2 is receiving the 200 OK from the peer
before the local ICE agent has completed

I don't believe it is necessarily a fault with ICE or the ice4j
implementation, but it is another gotcha for implementors to be aware
of, especially when linking ice4j to existing code - I am looking into
it more closely. I believe the application code needs to operate
something like the `barrier' pattern, where it does not proceed to setup
RTP until (ICE compeletion&& SIP 200) have both occurred.


#4

Hi Daniel,

Yes I think you are right. Second peer should wait its ICE completion
before processing the "200 OK" (and send its ACK right after).

It seems a bit odd though - because callee user should not be alerted
(ringing alert, or sending 180 back to caller) until both sides are
completely certain that media can flow. If the ICE agent on one side is
still active (it continues logging it's activities after the 200 OK came
in), then does that mean the ICE agent on the callee side has finished
too soon?

Further observations:

- after more delay, I noticed on the caller side the logcat shows an out
of memory error

- I was testing a configuration with multiple TURN servers, one
unreachable (e.g. I make two calls to
ICEAgent.addCandidateHarvester(
                    new TurnCandidateHarvester(.....))
and if I go back to using only a single TURN server, ICE setup completes
OK, no out of memory error

Should it be valid to setup multiple TURN servers for a single ICEAgent
instance?

On a side note, I have also implemented DNS SRV for STUN, so I search
_stun._udp.example.org and for each result found, I just call
ICEAgent.addCandidateHarvester() - maybe this logic should be
implemented in the ICE stack though, or alternatively, maybe
addCandidateHarvester() needs extra arguments to pass in the priority
and weight values, and it can then take those into account when
nominating the best candidate?

···

On 07/02/12 16:08, Daniel Pocock wrote:

On 07/02/12 13:45, Sebastien Vincent wrote:

We do similar way for our XMPP implementation: we wait ICE finished
before processing "session-accept" message.

Regards,
--
Seb

Le 07/02/12 12:56, Daniel Pocock a �crit :

I've observed an occasional race condition running ICE between my two
user agents:

Good:

device 1: ICE completion
device 2: ICE completion
device 1: send 200 OK
device 2: receive 200 OK

Bad:

device 1: ICE completion
device 1: send 200 OK
device 2: receive 200 OK
device 2: ICE completion

Notice in the `bad' case, device 2 is receiving the 200 OK from the peer
before the local ICE agent has completed

I don't believe it is necessarily a fault with ICE or the ice4j
implementation, but it is another gotcha for implementors to be aware
of, especially when linking ice4j to existing code - I am looking into
it more closely. I believe the application code needs to operate
something like the `barrier' pattern, where it does not proceed to setup
RTP until (ICE compeletion&& SIP 200) have both occurred.


#5

Le 07/02/12 16:08, Daniel Pocock a �crit :

Hi Daniel,

Yes I think you are right. Second peer should wait its ICE completion
before processing the "200 OK" (and send its ACK right after).

It seems a bit odd though - because callee user should not be alerted
(ringing alert, or sending 180 back to caller) until both sides are
completely certain that media can flow. If the ICE agent on one side is
still active (it continues logging it's activities after the 200 OK came
in), then does that mean the ICE agent on the callee side has finished
too soon?

Does the SIP or ICE RFCs says anything about to be sure that media can flow before alerting remote user ?

Regards,

···

On 07/02/12 13:45, Sebastien Vincent wrote:

--
Seb

We do similar way for our XMPP implementation: we wait ICE finished
before processing "session-accept" message.

Regards,
--
Seb

Le 07/02/12 12:56, Daniel Pocock a �crit :

I've observed an occasional race condition running ICE between my two
user agents:

Good:

device 1: ICE completion
device 2: ICE completion
device 1: send 200 OK
device 2: receive 200 OK

Bad:

device 1: ICE completion
device 1: send 200 OK
device 2: receive 200 OK
device 2: ICE completion

Notice in the `bad' case, device 2 is receiving the 200 OK from the peer
before the local ICE agent has completed

I don't believe it is necessarily a fault with ICE or the ice4j
implementation, but it is another gotcha for implementors to be aware
of, especially when linking ice4j to existing code - I am looking into
it more closely. I believe the application code needs to operate
something like the `barrier' pattern, where it does not proceed to setup
RTP until (ICE compeletion&& SIP 200) have both occurred.


#6

Hi,

Le 07/02/12 16:31, Daniel Pocock a �crit :

Hi Daniel,

Yes I think you are right. Second peer should wait its ICE completion
before processing the "200 OK" (and send its ACK right after).

It seems a bit odd though - because callee user should not be alerted
(ringing alert, or sending 180 back to caller) until both sides are
completely certain that media can flow. If the ICE agent on one side is
still active (it continues logging it's activities after the 200 OK came
in), then does that mean the ICE agent on the callee side has finished
too soon?

Further observations:

- after more delay, I noticed on the caller side the logcat shows an out
of memory error

Do you have the stacktrace of the outofmemory error ?

We have fixed an issue recently where some ICE threads were still active even if the call was hanged up. Maybe you should update to the latest version of ice4j and try again.

- I was testing a configuration with multiple TURN servers, one
unreachable (e.g. I make two calls to
ICEAgent.addCandidateHarvester(
                     new TurnCandidateHarvester(.....))
and if I go back to using only a single TURN server, ICE setup completes
OK, no out of memory error

Should it be valid to setup multiple TURN servers for a single ICEAgent
instance?

Yes, you can have as many STUN or TURN servers as you want.

On a side note, I have also implemented DNS SRV for STUN, so I search
_stun._udp.example.org and for each result found, I just call
ICEAgent.addCandidateHarvester() - maybe this logic should be
implemented in the ICE stack though, or alternatively, maybe
addCandidateHarvester() needs extra arguments to pass in the priority
and weight values, and it can then take those into account when
nominating the best candidate?

I think that this kind of DNS stuff is related to application and should not been part of the ICE implementation library. ICE candidate priority are automatically calculated, why do you want to modify this priority ? If you want to modify the way ICE nominate its candidate I recommend you to see NominationStrategy and DefaultNominator classes.

Regards,

···

On 07/02/12 16:08, Daniel Pocock wrote:

On 07/02/12 13:45, Sebastien Vincent wrote:

--
Seb

We do similar way for our XMPP implementation: we wait ICE finished
before processing "session-accept" message.

Regards,
--
Seb

Le 07/02/12 12:56, Daniel Pocock a �crit :

I've observed an occasional race condition running ICE between my two
user agents:

Good:

device 1: ICE completion
device 2: ICE completion
device 1: send 200 OK
device 2: receive 200 OK

Bad:

device 1: ICE completion
device 1: send 200 OK
device 2: receive 200 OK
device 2: ICE completion

Notice in the `bad' case, device 2 is receiving the 200 OK from the peer
before the local ICE agent has completed

I don't believe it is necessarily a fault with ICE or the ice4j
implementation, but it is another gotcha for implementors to be aware
of, especially when linking ice4j to existing code - I am looking into
it more closely. I believe the application code needs to operate
something like the `barrier' pattern, where it does not proceed to setup
RTP until (ICE compeletion&& SIP 200) have both occurred.


#7

Hi Daniel,

Yes I think you are right. Second peer should wait its ICE completion
before processing the "200 OK" (and send its ACK right after).

It seems a bit odd though - because callee user should not be alerted
(ringing alert, or sending 180 back to caller) until both sides are
completely certain that media can flow. If the ICE agent on one side is
still active (it continues logging it's activities after the 200 OK came
in), then does that mean the ICE agent on the callee side has finished
too soon?

Does the SIP or ICE RFCs says anything about to be sure that media can
flow before alerting remote user ?

It does actually but it's not a MUST and it also implies security
issues: I am not sure that I'd like anyone on the internet to know
everything about my IP addresses without me having a say in this. XMPP
recommends that behaviour only for people in one's contact list.

Cheers,
Emil

···

On 08.02.12 15:40, Sebastien Vincent wrote:

Le 07/02/12 16:08, Daniel Pocock a écrit :

On 07/02/12 13:45, Sebastien Vincent wrote:

Regards,
--
Seb

We do similar way for our XMPP implementation: we wait ICE finished
before processing "session-accept" message.

Regards,
--
Seb

Le 07/02/12 12:56, Daniel Pocock a écrit :

I've observed an occasional race condition running ICE between my two
user agents:

Good:

device 1: ICE completion
device 2: ICE completion
device 1: send 200 OK
device 2: receive 200 OK

Bad:

device 1: ICE completion
device 1: send 200 OK
device 2: receive 200 OK
device 2: ICE completion

Notice in the `bad' case, device 2 is receiving the 200 OK from the peer
before the local ICE agent has completed

I don't believe it is necessarily a fault with ICE or the ice4j
implementation, but it is another gotcha for implementors to be aware
of, especially when linking ice4j to existing code - I am looking into
it more closely. I believe the application code needs to operate
something like the `barrier' pattern, where it does not proceed to setup
RTP until (ICE compeletion&& SIP 200) have both occurred.

--
Emil Ivov, Ph.D. 67000 Strasbourg,
Project Lead France
Jitsi
emcho@jitsi.org PHONE: +33.1.77.62.43.30
http://jitsi.org FAX: +33.1.77.62.47.31


#8

I believe that it is also a regulatory issue: think about all those
nasty boiler rooms robodialing and making people's phone ring even when
there are not enough salespeople ready to speak to every person that
answers. People are answering their phone and they hear nothing.
Regulators have picked up on this now in various countries. I think it
is a good idea for ICE to be 100% certain about a media path before
making someone's phone ring.

The RFC is quite strong on this point too:

http://tools.ietf.org/html/rfc5245#section-12.1.1 (bottom of page 69)

"For this reason, implementations SHOULD delay alerting
   the called party until candidates for each component of each media
   stream have entered the valid list. In the case of a PSTN gateway,
   this would mean that the setup message into the PSTN is delayed until
   this point. Doing this increases the post-dial delay, but has the
   effect of eliminating 'ghost rings'. Ghost rings are cases where the
   called party hears the phone ring, picks up, but hears nothing and
   cannot be heard. "

···

On 08/02/12 15:47, Emil Ivov wrote:

On 08.02.12 15:40, Sebastien Vincent wrote:

Le 07/02/12 16:08, Daniel Pocock a �crit :

On 07/02/12 13:45, Sebastien Vincent wrote:

Hi Daniel,

Yes I think you are right. Second peer should wait its ICE completion
before processing the "200 OK" (and send its ACK right after).

It seems a bit odd though - because callee user should not be alerted
(ringing alert, or sending 180 back to caller) until both sides are
completely certain that media can flow. If the ICE agent on one side is
still active (it continues logging it's activities after the 200 OK came
in), then does that mean the ICE agent on the callee side has finished
too soon?

Does the SIP or ICE RFCs says anything about to be sure that media can
flow before alerting remote user ?

It does actually but it's not a MUST and it also implies security
issues: I am not sure that I'd like anyone on the internet to know
everything about my IP addresses without me having a say in this. XMPP
recommends that behaviour only for people in one's contact list.


#9

On a side note, I have also implemented DNS SRV for STUN, so I search
_stun._udp.example.org and for each result found, I just call
ICEAgent.addCandidateHarvester() - maybe this logic should be
implemented in the ICE stack though, or alternatively, maybe
addCandidateHarvester() needs extra arguments to pass in the priority
and weight values, and it can then take those into account when
nominating the best candidate?

I think that this kind of DNS stuff is related to application and should
not been part of the ICE implementation library. ICE candidate priority
are automatically calculated, why do you want to modify this priority ?
If you want to modify the way ICE nominate its candidate I recommend you
to see NominationStrategy and DefaultNominator classes.

http://tools.ietf.org/html/rfc5245#section-4.1.1.2

http://tools.ietf.org/html/rfc5389#section-9

Actually, it is in the RFCs, but optional

When multiple records are found, only one should be used - however, the
agent should try them all until one is found to be responsive:

"When following these procedures, if the STUN transaction times out
   without receipt of a response, the client SHOULD retry the request to
   the next server in the ordered defined by RFC 2782. Such a retry is
   only possible for request/response transmissions, since indication
   transactions generate no response or timeout."

This type of logic would appear to be necessary within the ice4j stack -
it can't be done by the application, as it is based on the actual
results of the attempts to contact the server.


#10

Yup, I am aware of the rationale and I can completely see the point.
However, this still means that, if you simply implement things this way,
then in quite a number of cases, anyone in the world would be able to
obtain your IP address(es), as long as they know your SIP URI and there's
nothing you could do to stop this.

--sent from my mobile

···

On Feb 8, 2012 7:33 PM, "Daniel Pocock" <daniel@pocock.com.au> wrote:

On 08/02/12 15:47, Emil Ivov wrote:
> On 08.02.12 15:40, Sebastien Vincent wrote:
>> Le 07/02/12 16:08, Daniel Pocock a écrit :
>>>
>>> On 07/02/12 13:45, Sebastien Vincent wrote:
>>>> Hi Daniel,
>>>>
>>>> Yes I think you are right. Second peer should wait its ICE completion
>>>> before processing the "200 OK" (and send its ACK right after).
>>> It seems a bit odd though - because callee user should not be alerted
>>> (ringing alert, or sending 180 back to caller) until both sides are
>>> completely certain that media can flow. If the ICE agent on one side
is
>>> still active (it continues logging it's activities after the 200 OK
came
>>> in), then does that mean the ICE agent on the callee side has finished
>>> too soon?
>>
>> Does the SIP or ICE RFCs says anything about to be sure that media can
>> flow before alerting remote user ?
>
> It does actually but it's not a MUST and it also implies security
> issues: I am not sure that I'd like anyone on the internet to know
> everything about my IP addresses without me having a say in this. XMPP
> recommends that behaviour only for people in one's contact list.
>

I believe that it is also a regulatory issue: think about all those
nasty boiler rooms robodialing and making people's phone ring even when
there are not enough salespeople ready to speak to every person that
answers. People are answering their phone and they hear nothing.
Regulators have picked up on this now in various countries. I think it
is a good idea for ICE to be 100% certain about a media path before
making someone's phone ring.

The RFC is quite strong on this point too:

http://tools.ietf.org/html/rfc5245#section-12.1.1 (bottom of page 69)

"For this reason, implementations SHOULD delay alerting
  the called party until candidates for each component of each media
  stream have entered the valid list. In the case of a PSTN gateway,
  this would mean that the setup message into the PSTN is delayed until
  this point. Doing this increases the post-dial delay, but has the
  effect of eliminating 'ghost rings'. Ghost rings are cases where the
  called party hears the phone ring, picks up, but hears nothing and
  cannot be heard. "


#11

I'm not sure if that is directly related to the alerting callee issue
though - the caller discovers the callee's IP addresses very quickly,
before the caller has time to decide if they answer or reject the call.

For the privacy issue, maybe we could introduce a hack to ice4j:
relay-only mode? Not great for those people providing relay capacity
though.

···

On 08/02/12 20:29, Emil Ivov wrote:

Yup, I am aware of the rationale and I can completely see the point.
However, this still means that, if you simply implement things this way,
then in quite a number of cases, anyone in the world would be able to
obtain your IP address(es), as long as they know your SIP URI and
there's nothing you could do to stop this.


#12

Yup, I am aware of the rationale and I can completely see the point.
However, this still means that, if you simply implement things this way,
then in quite a number of cases, anyone in the world would be able to
obtain your IP address(es), as long as they know your SIP URI and
there's nothing you could do to stop this.

I'm not sure if that is directly related to the alerting callee issue
though - the caller discovers the callee's IP addresses very quickly,
before the caller has time to decide if they answer or reject the call.

Unless the full address list gets sent only after the callee picks up.
Even without that though, users would at least know someone

For the privacy issue, maybe we could introduce a hack to ice4j:
relay-only mode? Not great for those people providing relay capacity
though.

Yup that's also what I was thinking. Start with a TURN/JingleNodes
address only and then restart ICE once the call is established.
Something along those lines.

Emil

···

On 08.02.12 20:35, Daniel Pocock wrote:

On 08/02/12 20:29, Emil Ivov wrote: