[jitsi-dev] ice4j DelegatingDatagramSocket receive blocking


#1

Sometimes I notice that Lumicall fails to receive any RTP

Using tcpdump I can see RTP streams being sent to the device

Looking at logcat I find:

I/elegatingDatagramSocket( 2622): RTP lost > 5%: 0.3333333333333333
I/elegatingDatagramSocket( 2622): RTP lost > 5%: 0.1601423487544484
I/elegatingDatagramSocket( 2622): RTP lost > 5%: 0.14795008912655971

and when the call is hung up and the RtpStreamReceiver destroyed, it
gives a stack showing that it was stuck in
org.ice4j.socket.MultiplexingDatagramSocket.receive(), here is an example:

W/System.err( 2622): java.net.SocketException:
RelayedCandidateDatagramSocket has been closed.
W/System.err( 2622): at
org.ice4j.socket.RelayedCandidateDatagramSocket.receive(RelayedCandidateDatagramSocket.java:707)
W/System.err( 2622): at
org.ice4j.socket.DelegatingDatagramSocket.receive(DelegatingDatagramSocket.java:648)
W/System.err( 2622): at
org.ice4j.socket.SafeCloseDatagramSocket.receive(SafeCloseDatagramSocket.java:184)
W/System.err( 2622): at
org.ice4j.socket.MultiplexingDatagramSocket.receive(MultiplexingDatagramSocket.java:635)
W/System.err( 2622): at
org.ice4j.socket.MultiplexingDatagramSocket.receive(MultiplexingDatagramSocket.java:508)
W/System.err( 2622): at
org.sipdroid.net.RtpSocket.receive(RtpSocket.java:80)
W/System.err( 2622): at
org.sipdroid.media.RtpStreamReceiver.receive_from_socket(RtpStreamReceiver.java:451)
W/System.err( 2622): at
org.sipdroid.media.RtpStreamReceiver.run(RtpStreamReceiver.java:660)
W/RtpStreamReceiver( 2622): Exception in receive/play block:
java.net.SocketException, RelayedCandidateDatagramSocket has been closed.

Sometimes it does work, sometimes it doesn't. Has anybody else seen
issues like this using the sockets from ice4j? Can anybody suggest
anything I should try?


#2

Well... when an application reads from a socket, the very reading is
usually instantaneous and most of the time the application waits for a
packet to arrive. In other words, the expected behaviour for
XXXDatagramSocket is to spend most of its time blocked in receive.

···

2014-12-14 23:30 GMT+02:00 Daniel Pocock <daniel@pocock.pro>:

and when the call is hung up and the RtpStreamReceiver destroyed, it
gives a stack showing that it was stuck in
org.ice4j.socket.MultiplexingDatagramSocket.receive(), here is an example:


#3

Agreed, that is how the API should work. However, I see packets going
through the wifi router to the Android device where ice4j is running so
each time it receives a packet I would expect the receive() call to
return. When this problem occurs, it seems to stay in receive()
forever, as if 0 packets are received.

I am going to add some more logging to the code to try and understand
why it is behaving like this and whether it is really receiving packets
from the network.

One thing I notice is that my code tries to use the sockets as soon as
the COMPLETED event is sent by ice4j. Should I be waiting for the
TERMINATED event instead?
https://github.com/opentelecoms-org/lumicall/blob/master/src/org/sipdroid/sipua/UserAgent.java#L1283

I also notice in the ice4j javadoc it suggests that the TERMINATED event
is related to the garbage collection/clean-up of the agent. That part
of the documentation is a bit confusing. Does it mean the IceAgent
should be garbage collected but the sockets should remain valid and can
be used after free()?
http://bluejimp.com/jitsi/ice4j/javadoc/org/ice4j/ice/Agent.html#free()

···

On 15/12/14 08:17, Lyubomir Marinov wrote:

2014-12-14 23:30 GMT+02:00 Daniel Pocock <daniel@pocock.pro>:

and when the call is hung up and the RtpStreamReceiver destroyed, it
gives a stack showing that it was stuck in
org.ice4j.socket.MultiplexingDatagramSocket.receive(), here is an example:

Well... when an application reads from a socket, the very reading is
usually instantaneous and most of the time the application waits for a
packet to arrive. In other words, the expected behaviour for
XXXDatagramSocket is to spend most of its time blocked in receive.


#4

I think there is some confusion about the socket type

The stack shows that it was blocked in
RelayedCandidateDatagramSocket.receive()

In the MultiplexingDatagramSocket, the packets are received from the
wire but none of them match the filter for
RelayedCandidateDatagramSocket and so they can't be returned by the
receive() method of that socket.

Looking at them with Wireshark, they are just raw RTP packets from the
TURN server - this is what I expect, as it is the other peer that is
behind a bad NAT.

It seems that either

a) my user agent has somehow received the wrong socket object from the
ice4j ICE agent, or

b) the ice4j socket is in a bad state and
MultiplexingDatagramSocket.receive() shouldn't be trying to use
DelayedCandidateDatagramSocket.receive()

I was trying to get the sockets immediately after the COMPLETED event,
I've tweaked the code to wait for the TERMINATED event but the problem
is still there.

···

On 15/12/14 10:02, Daniel Pocock wrote:

On 15/12/14 08:17, Lyubomir Marinov wrote:

2014-12-14 23:30 GMT+02:00 Daniel Pocock <daniel@pocock.pro>:

and when the call is hung up and the RtpStreamReceiver destroyed, it
gives a stack showing that it was stuck in
org.ice4j.socket.MultiplexingDatagramSocket.receive(), here is an example:

Well... when an application reads from a socket, the very reading is
usually instantaneous and most of the time the application waits for a
packet to arrive. In other words, the expected behaviour for
XXXDatagramSocket is to spend most of its time blocked in receive.

Agreed, that is how the API should work. However, I see packets going
through the wifi router to the Android device where ice4j is running so
each time it receives a packet I would expect the receive() call to
return. When this problem occurs, it seems to stay in receive()
forever, as if 0 packets are received.

I am going to add some more logging to the code to try and understand
why it is behaving like this and whether it is really receiving packets
from the network.

One thing I notice is that my code tries to use the sockets as soon as
the COMPLETED event is sent by ice4j. Should I be waiting for the
TERMINATED event instead?
https://github.com/opentelecoms-org/lumicall/blob/master/src/org/sipdroid/sipua/UserAgent.java#L1283

I also notice in the ice4j javadoc it suggests that the TERMINATED event
is related to the garbage collection/clean-up of the agent. That part
of the documentation is a bit confusing. Does it mean the IceAgent
should be garbage collected but the sockets should remain valid and can
be used after free()?
http://bluejimp.com/jitsi/ice4j/javadoc/org/ice4j/ice/Agent.html#free()


#5

I've looked more closely at this and I think it is incorrectly
configuring itself asymmetrically

Both endpoints have obtained a TURN allocation on a TURN server, they
both access the same TURN server

Each endpoint log shows that it thinks it is sending to the peer through
its TURN allocation, e.g. caller logs

    Selected pair for stream audio.RTP: 195.8.117.59:49398/udp/relay ->
12.34.56.78:33642/udp/prflx (audio.RTP)

and callee logs:

    Selected pair for stream audio.RTP: 195.8.117.59:49400/udp/relay ->
99.88.77.66:21000/udp/srflx (audio.RTP)

Focusing on the caller:

- all the packets it sends out are sent to port 3478 on the TURN server
- all the packets it receives are from port 49400 on the TURN server
(the port allocated to the peer/callee) and so the
RelayedCandidateDatagramSocket doesn't accept them, it expects them to
come back from source port 3478 and be encapsulated

The callee sees things much the same way - it is also sending to port
3478 and receiving a flow from port 49398 and ignoring it.

I came across this mailing list post about such problems:
    http://lists.freedesktop.org/archives/nice/2013-March/000758.html

Has anybody else seen ice4j behave like this? Is there something that
should be done to avoid this when configuring the agent? Is there any
way it could detect this situation and log a warning when a flow starts
arriving from the wrong source port?

···

On 16/12/14 07:40, Daniel Pocock wrote:

On 15/12/14 10:02, Daniel Pocock wrote:

On 15/12/14 08:17, Lyubomir Marinov wrote:

2014-12-14 23:30 GMT+02:00 Daniel Pocock <daniel@pocock.pro>:

and when the call is hung up and the RtpStreamReceiver destroyed, it
gives a stack showing that it was stuck in
org.ice4j.socket.MultiplexingDatagramSocket.receive(), here is an example:

Well... when an application reads from a socket, the very reading is
usually instantaneous and most of the time the application waits for a
packet to arrive. In other words, the expected behaviour for
XXXDatagramSocket is to spend most of its time blocked in receive.

Agreed, that is how the API should work. However, I see packets going
through the wifi router to the Android device where ice4j is running so
each time it receives a packet I would expect the receive() call to
return. When this problem occurs, it seems to stay in receive()
forever, as if 0 packets are received.

I am going to add some more logging to the code to try and understand
why it is behaving like this and whether it is really receiving packets
from the network.

One thing I notice is that my code tries to use the sockets as soon as
the COMPLETED event is sent by ice4j. Should I be waiting for the
TERMINATED event instead?
https://github.com/opentelecoms-org/lumicall/blob/master/src/org/sipdroid/sipua/UserAgent.java#L1283

I also notice in the ice4j javadoc it suggests that the TERMINATED event
is related to the garbage collection/clean-up of the agent. That part
of the documentation is a bit confusing. Does it mean the IceAgent
should be garbage collected but the sockets should remain valid and can
be used after free()?
http://bluejimp.com/jitsi/ice4j/javadoc/org/ice4j/ice/Agent.html#free()

I think there is some confusion about the socket type

The stack shows that it was blocked in
RelayedCandidateDatagramSocket.receive()

In the MultiplexingDatagramSocket, the packets are received from the
wire but none of them match the filter for
RelayedCandidateDatagramSocket and so they can't be returned by the
receive() method of that socket.

Looking at them with Wireshark, they are just raw RTP packets from the
TURN server - this is what I expect, as it is the other peer that is
behind a bad NAT.

It seems that either

a) my user agent has somehow received the wrong socket object from the
ice4j ICE agent, or

b) the ice4j socket is in a bad state and
MultiplexingDatagramSocket.receive() shouldn't be trying to use
DelayedCandidateDatagramSocket.receive()

I was trying to get the sockets immediately after the COMPLETED event,
I've tweaked the code to wait for the TERMINATED event but the problem
is still there.


#6

Could this be a consequence of not doing the recommended SIP re-INVITE?
Lumicall is a bit lazy about that and doesn't do it at present. Do I
need to implement the re-INVITE and when the UA receives a re-INVITE
with updated offer/answer, how should it pass the remote-candidates
attributes to the ice4j API?

I've implemented a workaround for now, if caller has a TURN allocation,
callee will not send a relay candidate[1]. While this is a crude
solution, it seems to help Lumicall users reach each other.

Regards,

Daniel

1.
https://github.com/opentelecoms-org/lumicall/commit/100009b77305841042f82d896bd6db806fd2cb9c

···

On 16/12/14 15:35, Daniel Pocock wrote:

On 16/12/14 07:40, Daniel Pocock wrote:

On 15/12/14 10:02, Daniel Pocock wrote:

On 15/12/14 08:17, Lyubomir Marinov wrote:

2014-12-14 23:30 GMT+02:00 Daniel Pocock <daniel@pocock.pro>:

and when the call is hung up and the RtpStreamReceiver destroyed, it
gives a stack showing that it was stuck in
org.ice4j.socket.MultiplexingDatagramSocket.receive(), here is an example:

Well... when an application reads from a socket, the very reading is
usually instantaneous and most of the time the application waits for a
packet to arrive. In other words, the expected behaviour for
XXXDatagramSocket is to spend most of its time blocked in receive.

Agreed, that is how the API should work. However, I see packets going
through the wifi router to the Android device where ice4j is running so
each time it receives a packet I would expect the receive() call to
return. When this problem occurs, it seems to stay in receive()
forever, as if 0 packets are received.

I am going to add some more logging to the code to try and understand
why it is behaving like this and whether it is really receiving packets
from the network.

One thing I notice is that my code tries to use the sockets as soon as
the COMPLETED event is sent by ice4j. Should I be waiting for the
TERMINATED event instead?
https://github.com/opentelecoms-org/lumicall/blob/master/src/org/sipdroid/sipua/UserAgent.java#L1283

I also notice in the ice4j javadoc it suggests that the TERMINATED event
is related to the garbage collection/clean-up of the agent. That part
of the documentation is a bit confusing. Does it mean the IceAgent
should be garbage collected but the sockets should remain valid and can
be used after free()?
http://bluejimp.com/jitsi/ice4j/javadoc/org/ice4j/ice/Agent.html#free()

I think there is some confusion about the socket type

The stack shows that it was blocked in
RelayedCandidateDatagramSocket.receive()

In the MultiplexingDatagramSocket, the packets are received from the
wire but none of them match the filter for
RelayedCandidateDatagramSocket and so they can't be returned by the
receive() method of that socket.

Looking at them with Wireshark, they are just raw RTP packets from the
TURN server - this is what I expect, as it is the other peer that is
behind a bad NAT.

It seems that either

a) my user agent has somehow received the wrong socket object from the
ice4j ICE agent, or

b) the ice4j socket is in a bad state and
MultiplexingDatagramSocket.receive() shouldn't be trying to use
DelayedCandidateDatagramSocket.receive()

I was trying to get the sockets immediately after the COMPLETED event,
I've tweaked the code to wait for the TERMINATED event but the problem
is still there.

I've looked more closely at this and I think it is incorrectly
configuring itself asymmetrically

Both endpoints have obtained a TURN allocation on a TURN server, they
both access the same TURN server

Each endpoint log shows that it thinks it is sending to the peer through
its TURN allocation, e.g. caller logs

    Selected pair for stream audio.RTP: 195.8.117.59:49398/udp/relay ->
12.34.56.78:33642/udp/prflx (audio.RTP)

and callee logs:

    Selected pair for stream audio.RTP: 195.8.117.59:49400/udp/relay ->
99.88.77.66:21000/udp/srflx (audio.RTP)

Focusing on the caller:

- all the packets it sends out are sent to port 3478 on the TURN server
- all the packets it receives are from port 49400 on the TURN server
(the port allocated to the peer/callee) and so the
RelayedCandidateDatagramSocket doesn't accept them, it expects them to
come back from source port 3478 and be encapsulated

The callee sees things much the same way - it is also sending to port
3478 and receiving a flow from port 49398 and ignoring it.

I came across this mailing list post about such problems:
    http://lists.freedesktop.org/archives/nice/2013-March/000758.html

Has anybody else seen ice4j behave like this? Is there something that
should be done to avoid this when configuring the agent? Is there any
way it could detect this situation and log a warning when a flow starts
arriving from the wrong source port?