[jitsi-dev] Freeing unselected candidates


#1

Hi,
We have sometimes experienced problems with Ice4j when we run multiple concurrent connections. The problem that occurs is that Java throws an OutOfMemoryErrors saying that it can't create any more native threads. While running a profiler on our code I noticed that there are quite a few Connector instances running for each connection (with connection I mean a PseudoTCP connection). I figured that most of these are not really needed once a pair has been selected and can therefore be freed. NOTE! This assumption is based on my very limited knowledge of the inner workings of Ice4j. The tests we have done seem to be working fine but please let me know if I'm wrong!

I have attached a patch that includes an new method in IceAgent that runs free() on all LocalCandidates that are not related to the selected pair (with directly or indirectly via the related address).

Feel free to use this however you like. Any initial thoughts? Are there edge cases where this code does more harm than good or do you think it can be useful?

Best Regards
Carl Hasselskog

Agent.java.patch (1.11 KB)


#2

Hey Carl,

Hi,

We have sometimes experienced problems with Ice4j when we run multiple
concurrent connections. The problem that occurs is that Java throws an
OutOfMemoryErrors saying that it can’t create any more native threads.

Argh ... not cool.

While running a profiler on our code I noticed that there are quite a
few Connector instances running for each connection (with connection I
mean a PseudoTCP connection).

How many are we talking about? Could they have been the reason for your exception?

I figured that most of these are not
really needed once a pair has been selected and can therefore be freed.
NOTE! This assumption is based on my very limited knowledge of the inner
workings of Ice4j. The tests we have done seem to be working fine but
please let me know if I’m wrong!

I have attached a patch that includes an new method in IceAgent that
runs free() on all LocalCandidates that are not related to the selected
pair (with directly or indirectly via the related address).

Well ... the free() method in the Agent already does this. Well it removes streams, that free components, that free candidates.

If you think any of these things are not happening in a normal free() then we should probably investigate that.

Am I making sense?
Emil

···

On 27.11.13, 18:01, Carl Hasselskog wrote:

Feel free to use this however you like. Any initial thoughts? Are there
edge cases where this code does more harm than good or do you think it
can be useful?

Best Regards

Carl Hasselskog

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

--
https://jitsi.org


#3

I am running approx. 10 concurrent connections. I don't know the exact number because it differs from client to client. I know about free(). However the way I understand it I can only call free() after I don't need the PseudoTCP-connection anymore. The idea behind this method is to close all Connector-threads for candidates that are not needed once the PseudoTCP-connection has been established with the selected candidate. Can I call free and the PseudoTCP-connection will still work? I was under the impression that calling free() closed all candidates, which would also close the PseudoTCP-connection. Am I misunderstanding how things work?

Regards
Carl

···

-----Original Message-----
From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 28 november 2013 12:05
To: Jitsi Developers
Cc: Carl Hasselskog
Subject: Re: [jitsi-dev] Freeing unselected candidates

Hey Carl,

On 27.11.13, 18:01, Carl Hasselskog wrote:

Hi,

We have sometimes experienced problems with Ice4j when we run multiple
concurrent connections. The problem that occurs is that Java throws an
OutOfMemoryErrors saying that it can't create any more native threads.

Argh ... not cool.

While running a profiler on our code I noticed that there are quite a
few Connector instances running for each connection (with connection I
mean a PseudoTCP connection).

How many are we talking about? Could they have been the reason for your exception?

I figured that most of these are not
really needed once a pair has been selected and can therefore be freed.
NOTE! This assumption is based on my very limited knowledge of the
inner workings of Ice4j. The tests we have done seem to be working
fine but please let me know if I'm wrong!

I have attached a patch that includes an new method in IceAgent that
runs free() on all LocalCandidates that are not related to the
selected pair (with directly or indirectly via the related address).

Well ... the free() method in the Agent already does this. Well it removes streams, that free components, that free candidates.

If you think any of these things are not happening in a normal free() then we should probably investigate that.

Am I making sense?
Emil

Feel free to use this however you like. Any initial thoughts? Are
there edge cases where this code does more harm than good or do you
think it can be useful?

Best Regards

Carl Hasselskog

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

--
https://jitsi.org


#4

I am running approx. 10 concurrent connections.

Doesn't sound like something that should cause an OutOfMem exception.

I don't know the
exact number because it differs from client to client. I know about
free(). However the way I understand it I can only call free() after
I don't need the PseudoTCP-connection anymore. The idea behind this
method is to close all Connector-threads for candidates that are not
needed once the PseudoTCP-connection has been established with the
selected candidate.

Right. I neglected that.

Can I call free and the PseudoTCP-connection will
still work?

No.

I was under the impression that calling free() closed all
candidates, which would also close the PseudoTCP-connection. Am I
misunderstanding how things work?

No, I wasn't careful enough it seems. I am surprised we are not doing that but I can see how it could have slipped.

OK then. Regarding the patch:

1. It would be good to know how having 10 open sockets could cause an out of mem exception. Even if the new method seems to remove this, it could very well be that it's simply masking it.

2. Could you please make sure it complies with:

https://jitsi.org/codeconvention

Currently lines go beyond 80 and there are no javadocs.

3. It would probably be better to use the same model here as for the regular free so that things would be consistent. That is: have streams and components do the freeing themselves. You could probably reuse the same methods but with an extra param.

4. Why is it that this method only works if there is a selected candidate?

Emil

···

On 29.11.13, 18:29, Carl Hasselskog wrote:

Regards Carl

-----Original Message----- From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 28 november 2013 12:05 To: Jitsi Developers Cc: Carl
Hasselskog Subject: Re: [jitsi-dev] Freeing unselected candidates

Hey Carl,

On 27.11.13, 18:01, Carl Hasselskog wrote:

Hi,

We have sometimes experienced problems with Ice4j when we run
multiple concurrent connections. The problem that occurs is that
Java throws an OutOfMemoryErrors saying that it can't create any
more native threads.

Argh ... not cool.

While running a profiler on our code I noticed that there are quite
a few Connector instances running for each connection (with
connection I mean a PseudoTCP connection).

How many are we talking about? Could they have been the reason for
your exception?

I figured that most of these are not really needed once a pair has
been selected and can therefore be freed. NOTE! This assumption is
based on my very limited knowledge of the inner workings of Ice4j.
The tests we have done seem to be working fine but please let me
know if I'm wrong!

I have attached a patch that includes an new method in IceAgent
that runs free() on all LocalCandidates that are not related to
the selected pair (with directly or indirectly via the related
address).

Well ... the free() method in the Agent already does this. Well it
removes streams, that free components, that free candidates.

If you think any of these things are not happening in a normal free()
then we should probably investigate that.

Am I making sense? Emil

Feel free to use this however you like. Any initial thoughts? Are
there edge cases where this code does more harm than good or do
you think it can be useful?

Best Regards

Carl Hasselskog

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

-- https://jitsi.org

--
https://jitsi.org


#5

I am running approx. 10 concurrent connections.

Doesn't sound like something that should cause an OutOfMem exception.

Note that the OutOfMemoryError is caused by failure to create more native threads. It's not running out of Java heap space. Creating a native thread is a pretty heavy operation. The default in Java is to create a thread stack of somewhere between 320kb - 1024kb (depending on JVM and OS). This is allocated outside the heap so it doesn't benefit from the GC's compaction operation after each GC. Combining lots of created threads and some memory fragmentation and you can easily run out free blocks that are big enough to hold a thread stack.

I don't know the exact details of the clients that have run into this issue. It could be that they are running >10 concurrent connections and if they also have more local candidates than average you could easily see that >100 threads are running simultaneously and the client could have been running for several days. 100 concurrent threads + lots of fragmentation can cause OutOfMemoryError.

Another way to reduce this risk even further (and increase Ice4j's overall performance) would be to make Ice4j use a thread-pool instead of creating new threads all the time. Even if the thread-pool is really big (to reduce the risk of starvation) it could reduce the fragmentation issue, since reusing the threads could reduce the fragmentation issue (since fewer thread stacks are allocated).

I don't know the
exact number because it differs from client to client. I know about
free(). However the way I understand it I can only call free() after I
don't need the PseudoTCP-connection anymore. The idea behind this
method is to close all Connector-threads for candidates that are not
needed once the PseudoTCP-connection has been established with the
selected candidate.

Right. I neglected that.

Can I call free and the PseudoTCP-connection will still work?

No.

I was under the impression that calling free() closed all candidates,
which would also close the PseudoTCP-connection. Am I misunderstanding
how things work?

No, I wasn't careful enough it seems. I am surprised we are not doing that but I can see how it could have slipped.

OK then. Regarding the patch:

1. It would be good to know how having 10 open sockets could cause an out of mem exception. Even if the new method seems to remove this, it could very well be that it's simply masking it.

2. Could you please make sure it complies with:

https://jitsi.org/codeconvention

Currently lines go beyond 80 and there are no javadocs.

3. It would probably be better to use the same model here as for the regular free so that things would be consistent. That is: have streams and components do the freeing themselves. You could probably reuse the same methods but with an extra param.

Sure I can change that. Just to make sure that I understand you correctly. You want me to merge the newly created method with free() and instead only have a boolean parameter that says whether the selected candidate should be free'd or not?

I personally don't see the point of restricting line width. Any decent IDE should handle word-wrapping according to the developers preference anyway. However, I do see the point of being consistent across the library so I'll change that as well.

4. Why is it that this method only works if there is a selected candidate?

I don't think I understand this question. What do you mean?

···

On 29.11.13, 18:29, Carl Hasselskog wrote:

Emil

Regards Carl

-----Original Message----- From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 28 november 2013 12:05 To: Jitsi Developers Cc: Carl
Hasselskog Subject: Re: [jitsi-dev] Freeing unselected candidates

Hey Carl,

On 27.11.13, 18:01, Carl Hasselskog wrote:

Hi,

We have sometimes experienced problems with Ice4j when we run
multiple concurrent connections. The problem that occurs is that Java
throws an OutOfMemoryErrors saying that it can't create any more
native threads.

Argh ... not cool.

While running a profiler on our code I noticed that there are quite a
few Connector instances running for each connection (with connection
I mean a PseudoTCP connection).

How many are we talking about? Could they have been the reason for
your exception?

I figured that most of these are not really needed once a pair has
been selected and can therefore be freed. NOTE! This assumption is
based on my very limited knowledge of the inner workings of Ice4j.
The tests we have done seem to be working fine but please let me know
if I'm wrong!

I have attached a patch that includes an new method in IceAgent that
runs free() on all LocalCandidates that are not related to the
selected pair (with directly or indirectly via the related address).

Well ... the free() method in the Agent already does this. Well it
removes streams, that free components, that free candidates.

If you think any of these things are not happening in a normal free()
then we should probably investigate that.

Am I making sense? Emil

Feel free to use this however you like. Any initial thoughts? Are
there edge cases where this code does more harm than good or do you
think it can be useful?

Best Regards

Carl Hasselskog

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

-- https://jitsi.org

--
https://jitsi.org


#6

I've seen the same issue with the videobridge, quite reproducible. Unfortunately, the system (a virtual machine) broke down completly so I could not get a process dump :-/

···

Am 30.11.2013 16:14, schrieb Carl Hasselskog:

On 29.11.13, 18:29, Carl Hasselskog wrote:

I am running approx. 10 concurrent connections.

Doesn't sound like something that should cause an OutOfMem exception.

Note that the OutOfMemoryError is caused by failure to create more native threads. It's not running out of Java heap space. Creating a native thread is a pretty heavy operation. The default in Java is to create a thread stack of somewhere between 320kb - 1024kb (depending on JVM and OS). This is allocated outside the heap so it doesn't benefit from the GC's compaction operation after each GC. Combining lots of created threads and some memory fragmentation and you can easily run out free blocks that are big enough to hold a thread stack.

I don't know the exact details of the clients that have run into this issue. It could be that they are running >10 concurrent connections and if they also have more local candidates than average you could easily see that >100 threads are running simultaneously and the client could have been running for several days. 100 concurrent threads + lots of fragmentation can cause OutOfMemoryError.


#7

I am running approx. 10 concurrent connections.

Doesn't sound like something that should cause an OutOfMem
exception.

Note that the OutOfMemoryError is caused by failure to create more
native threads. It's not running out of Java heap space. Creating a
native thread is a pretty heavy operation. The default in Java is to
create a thread stack of somewhere between 320kb - 1024kb (depending
on JVM and OS). This is allocated outside the heap so it doesn't
benefit from the GC's compaction operation after each GC. Combining
lots of created threads and some memory fragmentation and you can
easily run out free blocks that are big enough to hold a thread
stack.

Certainly. However "10" is definitely not a number where you would expect such things to happen.

I don't know the exact details of the clients that have run into this
issue. It could be that they are running >10 concurrent connections
and if they also have more local candidates than average you could
easily see that >100 threads are running simultaneously and the
client could have been running for several days. 100 concurrent
threads + lots of fragmentation can cause OutOfMemoryError.

A couple of 100 is still something that I would expect to pass but this is not really the point. The reason I am insisting on this is because the number of threads that would cause an OutOfMemoryError (which in my estimations should be in the thousands at least, but again that's not really the point) is something that I can't possibly imagine ever happening with normal client side operation of ice4j. I've regularly tested ice4j with 5 real and virtual interfaces, both IPv4 and IPv6 and four STUN/TURN servers and I have never seen this (although STUN and TURN shouldn't have an impact on threads).

This makes me think that something in your case must be going wrong in some way and someone must be creating threads out of control.

Another way to reduce this risk even further (and increase Ice4j's
overall performance) would be to make Ice4j use a thread-pool instead
of creating new threads all the time. Even if the thread-pool is
really big (to reduce the risk of starvation) it could reduce the
fragmentation issue, since reusing the threads could reduce the
fragmentation issue (since fewer thread stacks are allocated).

You mean, something like this: http://goo.gl/m6RxzQ ?

Or this: http://goo.gl/4xmaea ?

Note that what you patch does is this: once ice4j is done harvesting candidates and done checking them, which are the most intensive parts of the operation, it simply closes the sockets that are not in use any more.

At the time when you are doing this ice4j should already be in cruising altitude. There is no reason to be adding new threads at that time. Yet, that's when you see your exception (or so I gather) which means that something is still working heavily.

Simply closing unused candidate sockets at that point is more likely just addressing the symptoms rather than the actual cause.

I assume a thread dump would help us proceed less blindly. Do you think you could get one?

Emil

···

On 30.11.13, 16:14, Carl Hasselskog wrote:

On 29.11.13, 18:29, Carl Hasselskog wrote:

I don't know the exact number because it differs from client to
client. I know about free(). However the way I understand it I
can only call free() after I don't need the PseudoTCP-connection
anymore. The idea behind this method is to close all
Connector-threads for candidates that are not needed once the
PseudoTCP-connection has been established with the selected
candidate.

Right. I neglected that.

Can I call free and the PseudoTCP-connection will still work?

No.

I was under the impression that calling free() closed all
candidates, which would also close the PseudoTCP-connection. Am I
misunderstanding how things work?

No, I wasn't careful enough it seems. I am surprised we are not
doing that but I can see how it could have slipped.

OK then. Regarding the patch:

1. It would be good to know how having 10 open sockets could cause
an out of mem exception. Even if the new method seems to remove
this, it could very well be that it's simply masking it.

2. Could you please make sure it complies with:

https://jitsi.org/codeconvention

Currently lines go beyond 80 and there are no javadocs.

3. It would probably be better to use the same model here as for
the regular free so that things would be consistent. That is: have
streams and components do the freeing themselves. You could
probably reuse the same methods but with an extra param.

Sure I can change that. Just to make sure that I understand you
correctly. You want me to merge the newly created method with free()
and instead only have a boolean parameter that says whether the
selected candidate should be free'd or not?

I personally don't see the point of restricting line width. Any
decent IDE should handle word-wrapping according to the developers
preference anyway. However, I do see the point of being consistent
across the library so I'll change that as well.

4. Why is it that this method only works if there is a selected
candidate?

I don't think I understand this question. What do you mean?

Emil

Regards Carl

-----Original Message----- From: Emil Ivov
[mailto:emcho@jitsi.org] Sent: den 28 november 2013 12:05 To:
Jitsi Developers Cc: Carl Hasselskog Subject: Re: [jitsi-dev]
Freeing unselected candidates

Hey Carl,

On 27.11.13, 18:01, Carl Hasselskog wrote:

Hi,

We have sometimes experienced problems with Ice4j when we run
multiple concurrent connections. The problem that occurs is
that Java throws an OutOfMemoryErrors saying that it can't
create any more native threads.

Argh ... not cool.

While running a profiler on our code I noticed that there are
quite a few Connector instances running for each connection
(with connection I mean a PseudoTCP connection).

How many are we talking about? Could they have been the reason
for your exception?

I figured that most of these are not really needed once a pair
has been selected and can therefore be freed. NOTE! This
assumption is based on my very limited knowledge of the inner
workings of Ice4j. The tests we have done seem to be working
fine but please let me know if I'm wrong!

I have attached a patch that includes an new method in IceAgent
that runs free() on all LocalCandidates that are not related to
the selected pair (with directly or indirectly via the related
address).

Well ... the free() method in the Agent already does this. Well
it removes streams, that free components, that free candidates.

If you think any of these things are not happening in a normal
free() then we should probably investigate that.

Am I making sense? Emil

Feel free to use this however you like. Any initial thoughts?
Are there edge cases where this code does more harm than good
or do you think it can be useful?

Best Regards

Carl Hasselskog

_______________________________________________ dev mailing
list dev@jitsi.org Unsubscribe instructions and other list
options: http://lists.jitsi.org/mailman/listinfo/dev

-- https://jitsi.org

-- https://jitsi.org

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

--
https://jitsi.org


#8

In case it wasn't clear: I am not arguing that there is no issue. I am not even arguing that what Carl's patch does is unnecessary: closing unused candidate sockets should have already been in there. I am happy to integrate this patch (as long as we address the cosmetics I raised previously)

What I am arguing is that this patch is not the proper resolution for this issue.

Emil

···

On 30.11.13, 16:41, Philipp Hancke wrote:

Am 30.11.2013 16:14, schrieb Carl Hasselskog:

On 29.11.13, 18:29, Carl Hasselskog wrote:

I am running approx. 10 concurrent connections.

Doesn't sound like something that should cause an OutOfMem exception.

Note that the OutOfMemoryError is caused by failure to create more native threads. It's not running out of Java heap space. Creating a native thread is a pretty heavy operation. The default in Java is to create a thread stack of somewhere between 320kb - 1024kb (depending on JVM and OS). This is allocated outside the heap so it doesn't benefit from the GC's compaction operation after each GC. Combining lots of created threads and some memory fragmentation and you can easily run out free blocks that are big enough to hold a thread stack.

I don't know the exact details of the clients that have run into this issue. It could be that they are running >10 concurrent connections and if they also have more local candidates than average you could easily see that >100 threads are running simultaneously and the client could have been running for several days. 100 concurrent threads + lots of fragmentation can cause OutOfMemoryError.

I've seen the same issue with the videobridge, quite reproducible.
Unfortunately, the system (a virtual machine) broke down completly so I
could not get a process dump :-/

--
https://jitsi.org


#9

Yes, I agree with you that there definitely could be something else that's wrong and that there's a risk that the patch only treats the symptoms. But don't you agree that stopping unused threads and shutting down unused sockets is desirable either way? Our client doesn't create all ~10 connections at the same time. So even if it manages to create a connection without throwing an exception there's still a risk that the unused threads become a problem when the next connection is created.

One interesting thing is that every time the error has been logged the stack trace has been the same. That's certainly an indication that there's something else going on. If the client had just been randomly running out of native memory then I guess the exception would also be thrown at one of the other places were threads are created.
Exception: java.lang.OutOfMemoryError unable to create new native thread
  at java.lang.Thread.start0(Native Method)
  at java.lang.Thread.start(Thread.java: 693)
  at org.ice4j.stack.MessageProcessor.start(S: 164)
  at org.ice4j.stack.NetAccessManager.fillUpThreadPool(S: 305)
  at org.ice4j.stack.NetAccessManager.initThreadPool(S: 289)
  at org.ice4j.stack.NetAccessManager.<init>(S: 101)
  at org.ice4j.stack.StunStack.<init>(S: 295)
  at org.ice4j.ice.Agent.getStunStack(S: 1065)
  at org.ice4j.ice.ConnectivityCheckServer.<init>(S: 65)
  at org.ice4j.ice.Agent.<init>(S: 259)
  at org.ice4j.ice.Agent.<init>(S: 245)

I don't have a full thread dump but I've added that functionality to our latest version so hopefully we will get that logged this week.

I know about http://goo.gl/4xmaea. I'm the one who changed it so that the thread-pool gets reused between different instances of CandidateHarvesterSet. :slight_smile:

Maybe I'm misunderstanding how http://goo.gl/m6RxzQ works but I don't see how threads in there are reused in it. To me it looks like each thread is used only once by a MessageProcessor instance.

Regards
Carl

···

-----Original Message-----
From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 30 november 2013 16:45
To: Jitsi Developers
Cc: Carl Hasselskog
Subject: Re: [jitsi-dev] Freeing unselected candidates

On 30.11.13, 16:14, Carl Hasselskog wrote:

On 29.11.13, 18:29, Carl Hasselskog wrote:

I am running approx. 10 concurrent connections.

Doesn't sound like something that should cause an OutOfMem exception.

Note that the OutOfMemoryError is caused by failure to create more
native threads. It's not running out of Java heap space. Creating a
native thread is a pretty heavy operation. The default in Java is to
create a thread stack of somewhere between 320kb - 1024kb (depending
on JVM and OS). This is allocated outside the heap so it doesn't
benefit from the GC's compaction operation after each GC. Combining
lots of created threads and some memory fragmentation and you can
easily run out free blocks that are big enough to hold a thread stack.

Certainly. However "10" is definitely not a number where you would expect such things to happen.

I don't know the exact details of the clients that have run into this
issue. It could be that they are running >10 concurrent connections
and if they also have more local candidates than average you could
easily see that >100 threads are running simultaneously and the client
could have been running for several days. 100 concurrent threads +
lots of fragmentation can cause OutOfMemoryError.

A couple of 100 is still something that I would expect to pass but this is not really the point. The reason I am insisting on this is because the number of threads that would cause an OutOfMemoryError (which in my estimations should be in the thousands at least, but again that's not really the point) is something that I can't possibly imagine ever happening with normal client side operation of ice4j. I've regularly tested ice4j with 5 real and virtual interfaces, both IPv4 and IPv6 and four STUN/TURN servers and I have never seen this (although STUN and TURN shouldn't have an impact on threads).

This makes me think that something in your case must be going wrong in some way and someone must be creating threads out of control.

Another way to reduce this risk even further (and increase Ice4j's
overall performance) would be to make Ice4j use a thread-pool instead
of creating new threads all the time. Even if the thread-pool is
really big (to reduce the risk of starvation) it could reduce the
fragmentation issue, since reusing the threads could reduce the
fragmentation issue (since fewer thread stacks are allocated).

You mean, something like this: http://goo.gl/m6RxzQ ?

Or this: http://goo.gl/4xmaea ?

Note that what you patch does is this: once ice4j is done harvesting candidates and done checking them, which are the most intensive parts of the operation, it simply closes the sockets that are not in use any more.

At the time when you are doing this ice4j should already be in cruising altitude. There is no reason to be adding new threads at that time. Yet, that's when you see your exception (or so I gather) which means that something is still working heavily.

Simply closing unused candidate sockets at that point is more likely just addressing the symptoms rather than the actual cause.

I assume a thread dump would help us proceed less blindly. Do you think you could get one?

Emil

I don't know the exact number because it differs from client to
client. I know about free(). However the way I understand it I can
only call free() after I don't need the PseudoTCP-connection
anymore. The idea behind this method is to close all
Connector-threads for candidates that are not needed once the
PseudoTCP-connection has been established with the selected
candidate.

Right. I neglected that.

Can I call free and the PseudoTCP-connection will still work?

No.

I was under the impression that calling free() closed all
candidates, which would also close the PseudoTCP-connection. Am I
misunderstanding how things work?

No, I wasn't careful enough it seems. I am surprised we are not doing
that but I can see how it could have slipped.

OK then. Regarding the patch:

1. It would be good to know how having 10 open sockets could cause an
out of mem exception. Even if the new method seems to remove this, it
could very well be that it's simply masking it.

2. Could you please make sure it complies with:

https://jitsi.org/codeconvention

Currently lines go beyond 80 and there are no javadocs.

3. It would probably be better to use the same model here as for the
regular free so that things would be consistent. That is: have
streams and components do the freeing themselves. You could probably
reuse the same methods but with an extra param.

Sure I can change that. Just to make sure that I understand you
correctly. You want me to merge the newly created method with free()
and instead only have a boolean parameter that says whether the
selected candidate should be free'd or not?

I personally don't see the point of restricting line width. Any decent
IDE should handle word-wrapping according to the developers preference
anyway. However, I do see the point of being consistent across the
library so I'll change that as well.

4. Why is it that this method only works if there is a selected
candidate?

I don't think I understand this question. What do you mean?

Emil

Regards Carl

-----Original Message----- From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 28 november 2013 12:05 To:
Jitsi Developers Cc: Carl Hasselskog Subject: Re: [jitsi-dev]
Freeing unselected candidates

Hey Carl,

On 27.11.13, 18:01, Carl Hasselskog wrote:

Hi,

We have sometimes experienced problems with Ice4j when we run
multiple concurrent connections. The problem that occurs is that
Java throws an OutOfMemoryErrors saying that it can't create any
more native threads.

Argh ... not cool.

While running a profiler on our code I noticed that there are quite
a few Connector instances running for each connection (with
connection I mean a PseudoTCP connection).

How many are we talking about? Could they have been the reason for
your exception?

I figured that most of these are not really needed once a pair has
been selected and can therefore be freed. NOTE! This assumption is
based on my very limited knowledge of the inner workings of Ice4j.
The tests we have done seem to be working fine but please let me
know if I'm wrong!

I have attached a patch that includes an new method in IceAgent
that runs free() on all LocalCandidates that are not related to the
selected pair (with directly or indirectly via the related
address).

Well ... the free() method in the Agent already does this. Well it
removes streams, that free components, that free candidates.

If you think any of these things are not happening in a normal
free() then we should probably investigate that.

Am I making sense? Emil

Feel free to use this however you like. Any initial thoughts?
Are there edge cases where this code does more harm than good or do
you think it can be useful?

Best Regards

Carl Hasselskog

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list
options: http://lists.jitsi.org/mailman/listinfo/dev

-- https://jitsi.org

-- https://jitsi.org

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

--
https://jitsi.org


#10

I just realized one that perhaps could be the cause of the problem. We are creating a new Agent instance for each new connection. I assumed that was how it was supposed to be done but maybe I have misunderstood it. Should we reuse the same Agent instance across all connections?

Regards
Carl

···

-----Original Message-----
From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 30 november 2013 16:45
To: Jitsi Developers
Cc: Carl Hasselskog
Subject: Re: [jitsi-dev] Freeing unselected candidates

On 30.11.13, 16:14, Carl Hasselskog wrote:

On 29.11.13, 18:29, Carl Hasselskog wrote:

I am running approx. 10 concurrent connections.

Doesn't sound like something that should cause an OutOfMem exception.

Note that the OutOfMemoryError is caused by failure to create more
native threads. It's not running out of Java heap space. Creating a
native thread is a pretty heavy operation. The default in Java is to
create a thread stack of somewhere between 320kb - 1024kb (depending
on JVM and OS). This is allocated outside the heap so it doesn't
benefit from the GC's compaction operation after each GC. Combining
lots of created threads and some memory fragmentation and you can
easily run out free blocks that are big enough to hold a thread stack.

Certainly. However "10" is definitely not a number where you would expect such things to happen.

I don't know the exact details of the clients that have run into this
issue. It could be that they are running >10 concurrent connections
and if they also have more local candidates than average you could
easily see that >100 threads are running simultaneously and the client
could have been running for several days. 100 concurrent threads +
lots of fragmentation can cause OutOfMemoryError.

A couple of 100 is still something that I would expect to pass but this is not really the point. The reason I am insisting on this is because the number of threads that would cause an OutOfMemoryError (which in my estimations should be in the thousands at least, but again that's not really the point) is something that I can't possibly imagine ever happening with normal client side operation of ice4j. I've regularly tested ice4j with 5 real and virtual interfaces, both IPv4 and IPv6 and four STUN/TURN servers and I have never seen this (although STUN and TURN shouldn't have an impact on threads).

This makes me think that something in your case must be going wrong in some way and someone must be creating threads out of control.

Another way to reduce this risk even further (and increase Ice4j's
overall performance) would be to make Ice4j use a thread-pool instead
of creating new threads all the time. Even if the thread-pool is
really big (to reduce the risk of starvation) it could reduce the
fragmentation issue, since reusing the threads could reduce the
fragmentation issue (since fewer thread stacks are allocated).

You mean, something like this: http://goo.gl/m6RxzQ ?

Or this: http://goo.gl/4xmaea ?

Note that what you patch does is this: once ice4j is done harvesting candidates and done checking them, which are the most intensive parts of the operation, it simply closes the sockets that are not in use any more.

At the time when you are doing this ice4j should already be in cruising altitude. There is no reason to be adding new threads at that time. Yet, that's when you see your exception (or so I gather) which means that something is still working heavily.

Simply closing unused candidate sockets at that point is more likely just addressing the symptoms rather than the actual cause.

I assume a thread dump would help us proceed less blindly. Do you think you could get one?

Emil

I don't know the exact number because it differs from client to
client. I know about free(). However the way I understand it I can
only call free() after I don't need the PseudoTCP-connection
anymore. The idea behind this method is to close all
Connector-threads for candidates that are not needed once the
PseudoTCP-connection has been established with the selected
candidate.

Right. I neglected that.

Can I call free and the PseudoTCP-connection will still work?

No.

I was under the impression that calling free() closed all
candidates, which would also close the PseudoTCP-connection. Am I
misunderstanding how things work?

No, I wasn't careful enough it seems. I am surprised we are not doing
that but I can see how it could have slipped.

OK then. Regarding the patch:

1. It would be good to know how having 10 open sockets could cause an
out of mem exception. Even if the new method seems to remove this, it
could very well be that it's simply masking it.

2. Could you please make sure it complies with:

https://jitsi.org/codeconvention

Currently lines go beyond 80 and there are no javadocs.

3. It would probably be better to use the same model here as for the
regular free so that things would be consistent. That is: have
streams and components do the freeing themselves. You could probably
reuse the same methods but with an extra param.

Sure I can change that. Just to make sure that I understand you
correctly. You want me to merge the newly created method with free()
and instead only have a boolean parameter that says whether the
selected candidate should be free'd or not?

I personally don't see the point of restricting line width. Any decent
IDE should handle word-wrapping according to the developers preference
anyway. However, I do see the point of being consistent across the
library so I'll change that as well.

4. Why is it that this method only works if there is a selected
candidate?

I don't think I understand this question. What do you mean?

Emil

Regards Carl

-----Original Message----- From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 28 november 2013 12:05 To:
Jitsi Developers Cc: Carl Hasselskog Subject: Re: [jitsi-dev]
Freeing unselected candidates

Hey Carl,

On 27.11.13, 18:01, Carl Hasselskog wrote:

Hi,

We have sometimes experienced problems with Ice4j when we run
multiple concurrent connections. The problem that occurs is that
Java throws an OutOfMemoryErrors saying that it can't create any
more native threads.

Argh ... not cool.

While running a profiler on our code I noticed that there are quite
a few Connector instances running for each connection (with
connection I mean a PseudoTCP connection).

How many are we talking about? Could they have been the reason for
your exception?

I figured that most of these are not really needed once a pair has
been selected and can therefore be freed. NOTE! This assumption is
based on my very limited knowledge of the inner workings of Ice4j.
The tests we have done seem to be working fine but please let me
know if I'm wrong!

I have attached a patch that includes an new method in IceAgent
that runs free() on all LocalCandidates that are not related to the
selected pair (with directly or indirectly via the related
address).

Well ... the free() method in the Agent already does this. Well it
removes streams, that free components, that free candidates.

If you think any of these things are not happening in a normal
free() then we should probably investigate that.

Am I making sense? Emil

Feel free to use this however you like. Any initial thoughts?
Are there edge cases where this code does more harm than good or do
you think it can be useful?

Best Regards

Carl Hasselskog

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list
options: http://lists.jitsi.org/mailman/listinfo/dev

-- https://jitsi.org

-- https://jitsi.org

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

--
https://jitsi.org


#11

Here's a new version of the patch. This time it's not a separate method but just a Boolean parameter in free.

Regards
Carl

FreeUnselected.patch (6.01 KB)

···

-----Original Message-----
From: Carl Hasselskog
Sent: den 1 december 2013 12:02
To: Jitsi Developers
Subject: RE: [jitsi-dev] Freeing unselected candidates

I just realized one that perhaps could be the cause of the problem. We are creating a new Agent instance for each new connection. I assumed that was how it was supposed to be done but maybe I have misunderstood it. Should we reuse the same Agent instance across all connections?

Regards
Carl

-----Original Message-----
From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 30 november 2013 16:45
To: Jitsi Developers
Cc: Carl Hasselskog
Subject: Re: [jitsi-dev] Freeing unselected candidates

On 30.11.13, 16:14, Carl Hasselskog wrote:

On 29.11.13, 18:29, Carl Hasselskog wrote:

I am running approx. 10 concurrent connections.

Doesn't sound like something that should cause an OutOfMem exception.

Note that the OutOfMemoryError is caused by failure to create more
native threads. It's not running out of Java heap space. Creating a
native thread is a pretty heavy operation. The default in Java is to
create a thread stack of somewhere between 320kb - 1024kb (depending
on JVM and OS). This is allocated outside the heap so it doesn't
benefit from the GC's compaction operation after each GC. Combining
lots of created threads and some memory fragmentation and you can
easily run out free blocks that are big enough to hold a thread stack.

Certainly. However "10" is definitely not a number where you would expect such things to happen.

I don't know the exact details of the clients that have run into this
issue. It could be that they are running >10 concurrent connections
and if they also have more local candidates than average you could
easily see that >100 threads are running simultaneously and the client
could have been running for several days. 100 concurrent threads +
lots of fragmentation can cause OutOfMemoryError.

A couple of 100 is still something that I would expect to pass but this is not really the point. The reason I am insisting on this is because the number of threads that would cause an OutOfMemoryError (which in my estimations should be in the thousands at least, but again that's not really the point) is something that I can't possibly imagine ever happening with normal client side operation of ice4j. I've regularly tested ice4j with 5 real and virtual interfaces, both IPv4 and IPv6 and four STUN/TURN servers and I have never seen this (although STUN and TURN shouldn't have an impact on threads).

This makes me think that something in your case must be going wrong in some way and someone must be creating threads out of control.

Another way to reduce this risk even further (and increase Ice4j's
overall performance) would be to make Ice4j use a thread-pool instead
of creating new threads all the time. Even if the thread-pool is
really big (to reduce the risk of starvation) it could reduce the
fragmentation issue, since reusing the threads could reduce the
fragmentation issue (since fewer thread stacks are allocated).

You mean, something like this: http://goo.gl/m6RxzQ ?

Or this: http://goo.gl/4xmaea ?

Note that what you patch does is this: once ice4j is done harvesting candidates and done checking them, which are the most intensive parts of the operation, it simply closes the sockets that are not in use any more.

At the time when you are doing this ice4j should already be in cruising altitude. There is no reason to be adding new threads at that time. Yet, that's when you see your exception (or so I gather) which means that something is still working heavily.

Simply closing unused candidate sockets at that point is more likely just addressing the symptoms rather than the actual cause.

I assume a thread dump would help us proceed less blindly. Do you think you could get one?

Emil

I don't know the exact number because it differs from client to
client. I know about free(). However the way I understand it I can
only call free() after I don't need the PseudoTCP-connection
anymore. The idea behind this method is to close all
Connector-threads for candidates that are not needed once the
PseudoTCP-connection has been established with the selected
candidate.

Right. I neglected that.

Can I call free and the PseudoTCP-connection will still work?

No.

I was under the impression that calling free() closed all
candidates, which would also close the PseudoTCP-connection. Am I
misunderstanding how things work?

No, I wasn't careful enough it seems. I am surprised we are not doing
that but I can see how it could have slipped.

OK then. Regarding the patch:

1. It would be good to know how having 10 open sockets could cause an
out of mem exception. Even if the new method seems to remove this, it
could very well be that it's simply masking it.

2. Could you please make sure it complies with:

https://jitsi.org/codeconvention

Currently lines go beyond 80 and there are no javadocs.

3. It would probably be better to use the same model here as for the
regular free so that things would be consistent. That is: have
streams and components do the freeing themselves. You could probably
reuse the same methods but with an extra param.

Sure I can change that. Just to make sure that I understand you
correctly. You want me to merge the newly created method with free()
and instead only have a boolean parameter that says whether the
selected candidate should be free'd or not?

I personally don't see the point of restricting line width. Any decent
IDE should handle word-wrapping according to the developers preference
anyway. However, I do see the point of being consistent across the
library so I'll change that as well.

4. Why is it that this method only works if there is a selected
candidate?

I don't think I understand this question. What do you mean?

Emil

Regards Carl

-----Original Message----- From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 28 november 2013 12:05 To:
Jitsi Developers Cc: Carl Hasselskog Subject: Re: [jitsi-dev]
Freeing unselected candidates

Hey Carl,

On 27.11.13, 18:01, Carl Hasselskog wrote:

Hi,

We have sometimes experienced problems with Ice4j when we run
multiple concurrent connections. The problem that occurs is that
Java throws an OutOfMemoryErrors saying that it can't create any
more native threads.

Argh ... not cool.

While running a profiler on our code I noticed that there are quite
a few Connector instances running for each connection (with
connection I mean a PseudoTCP connection).

How many are we talking about? Could they have been the reason for
your exception?

I figured that most of these are not really needed once a pair has
been selected and can therefore be freed. NOTE! This assumption is
based on my very limited knowledge of the inner workings of Ice4j.
The tests we have done seem to be working fine but please let me
know if I'm wrong!

I have attached a patch that includes an new method in IceAgent
that runs free() on all LocalCandidates that are not related to the
selected pair (with directly or indirectly via the related
address).

Well ... the free() method in the Agent already does this. Well it
removes streams, that free components, that free candidates.

If you think any of these things are not happening in a normal
free() then we should probably investigate that.

Am I making sense? Emil

Feel free to use this however you like. Any initial thoughts?
Are there edge cases where this code does more harm than good or do
you think it can be useful?

Best Regards

Carl Hasselskog

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list
options: http://lists.jitsi.org/mailman/listinfo/dev

-- https://jitsi.org

-- https://jitsi.org

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

--
https://jitsi.org


#12

Sorry for all the e-mails today but I think I've found the root cause now!

I was looking through the logs of one of the users that experienced the bug. I found that this exception was being thrown when the connection was being initialized:

javax.sdp.SdpException The parameter is null
at gov.nist.javax.sdp.SessionDescriptionImpl.setOrigin(S:480)
at javax.sdp.SdpFactory.createSessionDescription(S:79)
at om.degoo.backend.ice4j.SdpUtils.createSDPDescription(S:42)

This was thrown just after the Agent instance was created but before the connection had been added to our list of connections (this list is regularly being purged, to avoid running out of resources). Because of this the Agent instance that was created was never free'd and therefore the client ran out of threads.

However, this doesn't explain way setOrigin is called with a null parameter. Any idea on why that might happen?

Regards
Carl

···

-----Original Message-----
From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 30 november 2013 16:45
To: Jitsi Developers
Cc: Carl Hasselskog
Subject: Re: [jitsi-dev] Freeing unselected candidates

On 30.11.13, 16:14, Carl Hasselskog wrote:

On 29.11.13, 18:29, Carl Hasselskog wrote:

I am running approx. 10 concurrent connections.

Doesn't sound like something that should cause an OutOfMem exception.

Note that the OutOfMemoryError is caused by failure to create more
native threads. It's not running out of Java heap space. Creating a
native thread is a pretty heavy operation. The default in Java is to
create a thread stack of somewhere between 320kb - 1024kb (depending
on JVM and OS). This is allocated outside the heap so it doesn't
benefit from the GC's compaction operation after each GC. Combining
lots of created threads and some memory fragmentation and you can
easily run out free blocks that are big enough to hold a thread stack.

Certainly. However "10" is definitely not a number where you would expect such things to happen.

I don't know the exact details of the clients that have run into this
issue. It could be that they are running >10 concurrent connections
and if they also have more local candidates than average you could
easily see that >100 threads are running simultaneously and the client
could have been running for several days. 100 concurrent threads +
lots of fragmentation can cause OutOfMemoryError.

A couple of 100 is still something that I would expect to pass but this is not really the point. The reason I am insisting on this is because the number of threads that would cause an OutOfMemoryError (which in my estimations should be in the thousands at least, but again that's not really the point) is something that I can't possibly imagine ever happening with normal client side operation of ice4j. I've regularly tested ice4j with 5 real and virtual interfaces, both IPv4 and IPv6 and four STUN/TURN servers and I have never seen this (although STUN and TURN shouldn't have an impact on threads).

This makes me think that something in your case must be going wrong in some way and someone must be creating threads out of control.

Another way to reduce this risk even further (and increase Ice4j's
overall performance) would be to make Ice4j use a thread-pool instead
of creating new threads all the time. Even if the thread-pool is
really big (to reduce the risk of starvation) it could reduce the
fragmentation issue, since reusing the threads could reduce the
fragmentation issue (since fewer thread stacks are allocated).

You mean, something like this: http://goo.gl/m6RxzQ ?

Or this: http://goo.gl/4xmaea ?

Note that what you patch does is this: once ice4j is done harvesting candidates and done checking them, which are the most intensive parts of the operation, it simply closes the sockets that are not in use any more.

At the time when you are doing this ice4j should already be in cruising altitude. There is no reason to be adding new threads at that time. Yet, that's when you see your exception (or so I gather) which means that something is still working heavily.

Simply closing unused candidate sockets at that point is more likely just addressing the symptoms rather than the actual cause.

I assume a thread dump would help us proceed less blindly. Do you think you could get one?

Emil

I don't know the exact number because it differs from client to
client. I know about free(). However the way I understand it I can
only call free() after I don't need the PseudoTCP-connection
anymore. The idea behind this method is to close all
Connector-threads for candidates that are not needed once the
PseudoTCP-connection has been established with the selected
candidate.

Right. I neglected that.

Can I call free and the PseudoTCP-connection will still work?

No.

I was under the impression that calling free() closed all
candidates, which would also close the PseudoTCP-connection. Am I
misunderstanding how things work?

No, I wasn't careful enough it seems. I am surprised we are not doing
that but I can see how it could have slipped.

OK then. Regarding the patch:

1. It would be good to know how having 10 open sockets could cause an
out of mem exception. Even if the new method seems to remove this, it
could very well be that it's simply masking it.

2. Could you please make sure it complies with:

https://jitsi.org/codeconvention

Currently lines go beyond 80 and there are no javadocs.

3. It would probably be better to use the same model here as for the
regular free so that things would be consistent. That is: have
streams and components do the freeing themselves. You could probably
reuse the same methods but with an extra param.

Sure I can change that. Just to make sure that I understand you
correctly. You want me to merge the newly created method with free()
and instead only have a boolean parameter that says whether the
selected candidate should be free'd or not?

I personally don't see the point of restricting line width. Any decent
IDE should handle word-wrapping according to the developers preference
anyway. However, I do see the point of being consistent across the
library so I'll change that as well.

4. Why is it that this method only works if there is a selected
candidate?

I don't think I understand this question. What do you mean?

Emil

Regards Carl

-----Original Message----- From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 28 november 2013 12:05 To:
Jitsi Developers Cc: Carl Hasselskog Subject: Re: [jitsi-dev]
Freeing unselected candidates

Hey Carl,

On 27.11.13, 18:01, Carl Hasselskog wrote:

Hi,

We have sometimes experienced problems with Ice4j when we run
multiple concurrent connections. The problem that occurs is that
Java throws an OutOfMemoryErrors saying that it can't create any
more native threads.

Argh ... not cool.

While running a profiler on our code I noticed that there are quite
a few Connector instances running for each connection (with
connection I mean a PseudoTCP connection).

How many are we talking about? Could they have been the reason for
your exception?

I figured that most of these are not really needed once a pair has
been selected and can therefore be freed. NOTE! This assumption is
based on my very limited knowledge of the inner workings of Ice4j.
The tests we have done seem to be working fine but please let me
know if I'm wrong!

I have attached a patch that includes an new method in IceAgent
that runs free() on all LocalCandidates that are not related to the
selected pair (with directly or indirectly via the related
address).

Well ... the free() method in the Agent already does this. Well it
removes streams, that free components, that free candidates.

If you think any of these things are not happening in a normal
free() then we should probably investigate that.

Am I making sense? Emil

Feel free to use this however you like. Any initial thoughts?
Are there edge cases where this code does more harm than good or do
you think it can be useful?

Best Regards

Carl Hasselskog

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list
options: http://lists.jitsi.org/mailman/listinfo/dev

-- https://jitsi.org

-- https://jitsi.org

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

--
https://jitsi.org


#13

I just realized one that perhaps could be the cause of the problem.
We are creating a new Agent instance for each new connection. I
assumed that was how it was supposed to be done but maybe I have
misunderstood it. Should we reuse the same Agent instance across all
connections?

No, that's necessary. Agents should be properly freed after each call.

Emil

···

On 01.12.13, 12:01, Carl Hasselskog wrote:

Regards Carl

-----Original Message----- From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 30 november 2013 16:45 To: Jitsi Developers Cc: Carl
Hasselskog Subject: Re: [jitsi-dev] Freeing unselected candidates

On 30.11.13, 16:14, Carl Hasselskog wrote:

On 29.11.13, 18:29, Carl Hasselskog wrote:

I am running approx. 10 concurrent connections.

Doesn't sound like something that should cause an OutOfMem
exception.

Note that the OutOfMemoryError is caused by failure to create more
native threads. It's not running out of Java heap space. Creating
a native thread is a pretty heavy operation. The default in Java is
to create a thread stack of somewhere between 320kb - 1024kb
(depending on JVM and OS). This is allocated outside the heap so it
doesn't benefit from the GC's compaction operation after each GC.
Combining lots of created threads and some memory fragmentation and
you can easily run out free blocks that are big enough to hold a
thread stack.

Certainly. However "10" is definitely not a number where you would
expect such things to happen.

I don't know the exact details of the clients that have run into
this issue. It could be that they are running >10 concurrent
connections and if they also have more local candidates than
average you could easily see that >100 threads are running
simultaneously and the client could have been running for several
days. 100 concurrent threads + lots of fragmentation can cause
OutOfMemoryError.

A couple of 100 is still something that I would expect to pass but
this is not really the point. The reason I am insisting on this is
because the number of threads that would cause an OutOfMemoryError
(which in my estimations should be in the thousands at least, but
again that's not really the point) is something that I can't possibly
imagine ever happening with normal client side operation of ice4j.
I've regularly tested ice4j with 5 real and virtual interfaces, both
IPv4 and IPv6 and four STUN/TURN servers and I have never seen this
(although STUN and TURN shouldn't have an impact on threads).

This makes me think that something in your case must be going wrong
in some way and someone must be creating threads out of control.

Another way to reduce this risk even further (and increase Ice4j's
overall performance) would be to make Ice4j use a thread-pool
instead of creating new threads all the time. Even if the
thread-pool is really big (to reduce the risk of starvation) it
could reduce the fragmentation issue, since reusing the threads
could reduce the fragmentation issue (since fewer thread stacks are
allocated).

You mean, something like this: http://goo.gl/m6RxzQ ?

Or this: http://goo.gl/4xmaea ?

Note that what you patch does is this: once ice4j is done harvesting
candidates and done checking them, which are the most intensive parts
of the operation, it simply closes the sockets that are not in use
any more.

At the time when you are doing this ice4j should already be in
cruising altitude. There is no reason to be adding new threads at
that time. Yet, that's when you see your exception (or so I gather)
which means that something is still working heavily.

Simply closing unused candidate sockets at that point is more likely
just addressing the symptoms rather than the actual cause.

I assume a thread dump would help us proceed less blindly. Do you
think you could get one?

Emil

I don't know the exact number because it differs from client
to client. I know about free(). However the way I understand it
I can only call free() after I don't need the
PseudoTCP-connection anymore. The idea behind this method is to
close all Connector-threads for candidates that are not needed
once the PseudoTCP-connection has been established with the
selected candidate.

Right. I neglected that.

Can I call free and the PseudoTCP-connection will still work?

No.

I was under the impression that calling free() closed all
candidates, which would also close the PseudoTCP-connection. Am
I misunderstanding how things work?

No, I wasn't careful enough it seems. I am surprised we are not
doing that but I can see how it could have slipped.

OK then. Regarding the patch:

1. It would be good to know how having 10 open sockets could
cause an out of mem exception. Even if the new method seems to
remove this, it could very well be that it's simply masking it.

2. Could you please make sure it complies with:

https://jitsi.org/codeconvention

Currently lines go beyond 80 and there are no javadocs.

3. It would probably be better to use the same model here as for
the regular free so that things would be consistent. That is:
have streams and components do the freeing themselves. You could
probably reuse the same methods but with an extra param.

Sure I can change that. Just to make sure that I understand you
correctly. You want me to merge the newly created method with
free() and instead only have a boolean parameter that says whether
the selected candidate should be free'd or not?

I personally don't see the point of restricting line width. Any
decent IDE should handle word-wrapping according to the developers
preference anyway. However, I do see the point of being consistent
across the library so I'll change that as well.

4. Why is it that this method only works if there is a selected
candidate?

I don't think I understand this question. What do you mean?

Emil

Regards Carl

-----Original Message----- From: Emil Ivov
[mailto:emcho@jitsi.org] Sent: den 28 november 2013 12:05 To:
Jitsi Developers Cc: Carl Hasselskog Subject: Re: [jitsi-dev]
Freeing unselected candidates

Hey Carl,

On 27.11.13, 18:01, Carl Hasselskog wrote:

Hi,

We have sometimes experienced problems with Ice4j when we
run multiple concurrent connections. The problem that occurs
is that Java throws an OutOfMemoryErrors saying that it can't
create any more native threads.

Argh ... not cool.

While running a profiler on our code I noticed that there are
quite a few Connector instances running for each connection
(with connection I mean a PseudoTCP connection).

How many are we talking about? Could they have been the reason
for your exception?

I figured that most of these are not really needed once a
pair has been selected and can therefore be freed. NOTE! This
assumption is based on my very limited knowledge of the inner
workings of Ice4j. The tests we have done seem to be working
fine but please let me know if I'm wrong!

I have attached a patch that includes an new method in
IceAgent that runs free() on all LocalCandidates that are not
related to the selected pair (with directly or indirectly via
the related address).

Well ... the free() method in the Agent already does this. Well
it removes streams, that free components, that free
candidates.

If you think any of these things are not happening in a normal
free() then we should probably investigate that.

Am I making sense? Emil

Feel free to use this however you like. Any initial
thoughts? Are there edge cases where this code does more harm
than good or do you think it can be useful?

Best Regards

Carl Hasselskog

_______________________________________________ dev mailing
list dev@jitsi.org Unsubscribe instructions and other list
options: http://lists.jitsi.org/mailman/listinfo/dev

-- https://jitsi.org

-- https://jitsi.org

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

-- https://jitsi.org

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

--
https://jitsi.org


#14

Yes, I agree with you that there definitely could be something else
that's wrong and that there's a risk that the patch only treats the
symptoms. But don't you agree that stopping unused threads and
shutting down unused sockets is desirable either way?

Yes that's exactly what I meant in my earlier mail today.

Our client
doesn't create all ~10 connections at the same time. So even if it
manages to create a connection without throwing an exception there's
still a risk that the unused threads become a problem when the next
connection is created.

Threads do get released at the end of a call so I think the problem wouldn't have occurred.

One interesting thing is that every time the error has been logged
the stack trace has been the same. That's certainly an indication
that there's something else going on. If the client had just been
randomly running out of native memory then I guess the exception
would also be thrown at one of the other places were threads are
created. Exception: java.lang.OutOfMemoryError unable to create new
native thread at java.lang.Thread.start0(Native Method) at
java.lang.Thread.start(Thread.java: 693) at
org.ice4j.stack.MessageProcessor.start(S: 164) at
org.ice4j.stack.NetAccessManager.fillUpThreadPool(S: 305) at
org.ice4j.stack.NetAccessManager.initThreadPool(S: 289) at
org.ice4j.stack.NetAccessManager.<init>(S: 101) at
org.ice4j.stack.StunStack.<init>(S: 295) at
org.ice4j.ice.Agent.getStunStack(S: 1065) at
org.ice4j.ice.ConnectivityCheckServer.<init>(S: 65) at
org.ice4j.ice.Agent.<init>(S: 259) at org.ice4j.ice.Agent.<init>(S:
245)

I wasn't saying the problem is not coming from ice4j.

Emil

···

On 01.12.13, 11:08, Carl Hasselskog wrote:

I don't have a full thread dump but I've added that functionality to
our latest version so hopefully we will get that logged this week.

I know about http://goo.gl/4xmaea. I'm the one who changed it so that
the thread-pool gets reused between different instances of
CandidateHarvesterSet. :slight_smile:

Maybe I'm misunderstanding how http://goo.gl/m6RxzQ works but I don't
see how threads in there are reused in it. To me it looks like each
thread is used only once by a MessageProcessor instance.

Regards Carl

-----Original Message----- From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 30 november 2013 16:45 To: Jitsi Developers Cc: Carl
Hasselskog Subject: Re: [jitsi-dev] Freeing unselected candidates

On 30.11.13, 16:14, Carl Hasselskog wrote:

On 29.11.13, 18:29, Carl Hasselskog wrote:

I am running approx. 10 concurrent connections.

Doesn't sound like something that should cause an OutOfMem
exception.

Note that the OutOfMemoryError is caused by failure to create more
native threads. It's not running out of Java heap space. Creating
a native thread is a pretty heavy operation. The default in Java is
to create a thread stack of somewhere between 320kb - 1024kb
(depending on JVM and OS). This is allocated outside the heap so it
doesn't benefit from the GC's compaction operation after each GC.
Combining lots of created threads and some memory fragmentation and
you can easily run out free blocks that are big enough to hold a
thread stack.

Certainly. However "10" is definitely not a number where you would
expect such things to happen.

I don't know the exact details of the clients that have run into
this issue. It could be that they are running >10 concurrent
connections and if they also have more local candidates than
average you could easily see that >100 threads are running
simultaneously and the client could have been running for several
days. 100 concurrent threads + lots of fragmentation can cause
OutOfMemoryError.

A couple of 100 is still something that I would expect to pass but
this is not really the point. The reason I am insisting on this is
because the number of threads that would cause an OutOfMemoryError
(which in my estimations should be in the thousands at least, but
again that's not really the point) is something that I can't possibly
imagine ever happening with normal client side operation of ice4j.
I've regularly tested ice4j with 5 real and virtual interfaces, both
IPv4 and IPv6 and four STUN/TURN servers and I have never seen this
(although STUN and TURN shouldn't have an impact on threads).

This makes me think that something in your case must be going wrong
in some way and someone must be creating threads out of control.

Another way to reduce this risk even further (and increase Ice4j's
overall performance) would be to make Ice4j use a thread-pool
instead of creating new threads all the time. Even if the
thread-pool is really big (to reduce the risk of starvation) it
could reduce the fragmentation issue, since reusing the threads
could reduce the fragmentation issue (since fewer thread stacks are
allocated).

You mean, something like this: http://goo.gl/m6RxzQ ?

Or this: http://goo.gl/4xmaea ?

Note that what you patch does is this: once ice4j is done harvesting
candidates and done checking them, which are the most intensive parts
of the operation, it simply closes the sockets that are not in use
any more.

At the time when you are doing this ice4j should already be in
cruising altitude. There is no reason to be adding new threads at
that time. Yet, that's when you see your exception (or so I gather)
which means that something is still working heavily.

Simply closing unused candidate sockets at that point is more likely
just addressing the symptoms rather than the actual cause.

I assume a thread dump would help us proceed less blindly. Do you
think you could get one?

Emil

I don't know the exact number because it differs from client
to client. I know about free(). However the way I understand it
I can only call free() after I don't need the
PseudoTCP-connection anymore. The idea behind this method is to
close all Connector-threads for candidates that are not needed
once the PseudoTCP-connection has been established with the
selected candidate.

Right. I neglected that.

Can I call free and the PseudoTCP-connection will still work?

No.

I was under the impression that calling free() closed all
candidates, which would also close the PseudoTCP-connection. Am
I misunderstanding how things work?

No, I wasn't careful enough it seems. I am surprised we are not
doing that but I can see how it could have slipped.

OK then. Regarding the patch:

1. It would be good to know how having 10 open sockets could
cause an out of mem exception. Even if the new method seems to
remove this, it could very well be that it's simply masking it.

2. Could you please make sure it complies with:

https://jitsi.org/codeconvention

Currently lines go beyond 80 and there are no javadocs.

3. It would probably be better to use the same model here as for
the regular free so that things would be consistent. That is:
have streams and components do the freeing themselves. You could
probably reuse the same methods but with an extra param.

Sure I can change that. Just to make sure that I understand you
correctly. You want me to merge the newly created method with
free() and instead only have a boolean parameter that says whether
the selected candidate should be free'd or not?

I personally don't see the point of restricting line width. Any
decent IDE should handle word-wrapping according to the developers
preference anyway. However, I do see the point of being consistent
across the library so I'll change that as well.

4. Why is it that this method only works if there is a selected
candidate?

I don't think I understand this question. What do you mean?

Emil

Regards Carl

-----Original Message----- From: Emil Ivov
[mailto:emcho@jitsi.org] Sent: den 28 november 2013 12:05 To:
Jitsi Developers Cc: Carl Hasselskog Subject: Re: [jitsi-dev]
Freeing unselected candidates

Hey Carl,

On 27.11.13, 18:01, Carl Hasselskog wrote:

Hi,

We have sometimes experienced problems with Ice4j when we
run multiple concurrent connections. The problem that occurs
is that Java throws an OutOfMemoryErrors saying that it can't
create any more native threads.

Argh ... not cool.

While running a profiler on our code I noticed that there are
quite a few Connector instances running for each connection
(with connection I mean a PseudoTCP connection).

How many are we talking about? Could they have been the reason
for your exception?

I figured that most of these are not really needed once a
pair has been selected and can therefore be freed. NOTE! This
assumption is based on my very limited knowledge of the inner
workings of Ice4j. The tests we have done seem to be working
fine but please let me know if I'm wrong!

I have attached a patch that includes an new method in
IceAgent that runs free() on all LocalCandidates that are not
related to the selected pair (with directly or indirectly via
the related address).

Well ... the free() method in the Agent already does this. Well
it removes streams, that free components, that free
candidates.

If you think any of these things are not happening in a normal
free() then we should probably investigate that.

Am I making sense? Emil

Feel free to use this however you like. Any initial
thoughts? Are there edge cases where this code does more harm
than good or do you think it can be useful?

Best Regards

Carl Hasselskog

_______________________________________________ dev mailing
list dev@jitsi.org Unsubscribe instructions and other list
options: http://lists.jitsi.org/mailman/listinfo/dev

-- https://jitsi.org

-- https://jitsi.org

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

-- https://jitsi.org

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

--
https://jitsi.org


#15

Thanks!

Could you also have a look at my other comments on this patch? (coding, convention, javadocs, non-operation in the absence of selected candidates).

Emil

···

On 01.12.13, 14:03, Carl Hasselskog wrote:

Here's a new version of the patch. This time it's not a separate method but just a Boolean parameter in free.

Regards
Carl

-----Original Message-----
From: Carl Hasselskog
Sent: den 1 december 2013 12:02
To: Jitsi Developers
Subject: RE: [jitsi-dev] Freeing unselected candidates

I just realized one that perhaps could be the cause of the problem. We are creating a new Agent instance for each new connection. I assumed that was how it was supposed to be done but maybe I have misunderstood it. Should we reuse the same Agent instance across all connections?

Regards
Carl

-----Original Message-----
From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 30 november 2013 16:45
To: Jitsi Developers
Cc: Carl Hasselskog
Subject: Re: [jitsi-dev] Freeing unselected candidates

On 30.11.13, 16:14, Carl Hasselskog wrote:

On 29.11.13, 18:29, Carl Hasselskog wrote:

I am running approx. 10 concurrent connections.

Doesn't sound like something that should cause an OutOfMem exception.

Note that the OutOfMemoryError is caused by failure to create more
native threads. It's not running out of Java heap space. Creating a
native thread is a pretty heavy operation. The default in Java is to
create a thread stack of somewhere between 320kb - 1024kb (depending
on JVM and OS). This is allocated outside the heap so it doesn't
benefit from the GC's compaction operation after each GC. Combining
lots of created threads and some memory fragmentation and you can
easily run out free blocks that are big enough to hold a thread stack.

Certainly. However "10" is definitely not a number where you would expect such things to happen.

I don't know the exact details of the clients that have run into this
issue. It could be that they are running >10 concurrent connections
and if they also have more local candidates than average you could
easily see that >100 threads are running simultaneously and the client
could have been running for several days. 100 concurrent threads +
lots of fragmentation can cause OutOfMemoryError.

A couple of 100 is still something that I would expect to pass but this is not really the point. The reason I am insisting on this is because the number of threads that would cause an OutOfMemoryError (which in my estimations should be in the thousands at least, but again that's not really the point) is something that I can't possibly imagine ever happening with normal client side operation of ice4j. I've regularly tested ice4j with 5 real and virtual interfaces, both IPv4 and IPv6 and four STUN/TURN servers and I have never seen this (although STUN and TURN shouldn't have an impact on threads).

This makes me think that something in your case must be going wrong in some way and someone must be creating threads out of control.

Another way to reduce this risk even further (and increase Ice4j's
overall performance) would be to make Ice4j use a thread-pool instead
of creating new threads all the time. Even if the thread-pool is
really big (to reduce the risk of starvation) it could reduce the
fragmentation issue, since reusing the threads could reduce the
fragmentation issue (since fewer thread stacks are allocated).

You mean, something like this: http://goo.gl/m6RxzQ ?

Or this: http://goo.gl/4xmaea ?

Note that what you patch does is this: once ice4j is done harvesting candidates and done checking them, which are the most intensive parts of the operation, it simply closes the sockets that are not in use any more.

At the time when you are doing this ice4j should already be in cruising altitude. There is no reason to be adding new threads at that time. Yet, that's when you see your exception (or so I gather) which means that something is still working heavily.

Simply closing unused candidate sockets at that point is more likely just addressing the symptoms rather than the actual cause.

I assume a thread dump would help us proceed less blindly. Do you think you could get one?

Emil

I don't know the exact number because it differs from client to
client. I know about free(). However the way I understand it I can
only call free() after I don't need the PseudoTCP-connection
anymore. The idea behind this method is to close all
Connector-threads for candidates that are not needed once the
PseudoTCP-connection has been established with the selected
candidate.

Right. I neglected that.

Can I call free and the PseudoTCP-connection will still work?

No.

I was under the impression that calling free() closed all
candidates, which would also close the PseudoTCP-connection. Am I
misunderstanding how things work?

No, I wasn't careful enough it seems. I am surprised we are not doing
that but I can see how it could have slipped.

OK then. Regarding the patch:

1. It would be good to know how having 10 open sockets could cause an
out of mem exception. Even if the new method seems to remove this, it
could very well be that it's simply masking it.

2. Could you please make sure it complies with:

https://jitsi.org/codeconvention

Currently lines go beyond 80 and there are no javadocs.

3. It would probably be better to use the same model here as for the
regular free so that things would be consistent. That is: have
streams and components do the freeing themselves. You could probably
reuse the same methods but with an extra param.

Sure I can change that. Just to make sure that I understand you
correctly. You want me to merge the newly created method with free()
and instead only have a boolean parameter that says whether the
selected candidate should be free'd or not?

I personally don't see the point of restricting line width. Any decent
IDE should handle word-wrapping according to the developers preference
anyway. However, I do see the point of being consistent across the
library so I'll change that as well.

4. Why is it that this method only works if there is a selected
candidate?

I don't think I understand this question. What do you mean?

Emil

Regards Carl

-----Original Message----- From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 28 november 2013 12:05 To:
Jitsi Developers Cc: Carl Hasselskog Subject: Re: [jitsi-dev]
Freeing unselected candidates

Hey Carl,

On 27.11.13, 18:01, Carl Hasselskog wrote:

Hi,

We have sometimes experienced problems with Ice4j when we run
multiple concurrent connections. The problem that occurs is that
Java throws an OutOfMemoryErrors saying that it can't create any
more native threads.

Argh ... not cool.

While running a profiler on our code I noticed that there are quite
a few Connector instances running for each connection (with
connection I mean a PseudoTCP connection).

How many are we talking about? Could they have been the reason for
your exception?

I figured that most of these are not really needed once a pair has
been selected and can therefore be freed. NOTE! This assumption is
based on my very limited knowledge of the inner workings of Ice4j.
The tests we have done seem to be working fine but please let me
know if I'm wrong!

I have attached a patch that includes an new method in IceAgent
that runs free() on all LocalCandidates that are not related to the
selected pair (with directly or indirectly via the related
address).

Well ... the free() method in the Agent already does this. Well it
removes streams, that free components, that free candidates.

If you think any of these things are not happening in a normal
free() then we should probably investigate that.

Am I making sense? Emil

Feel free to use this however you like. Any initial thoughts?
Are there edge cases where this code does more harm than good or do
you think it can be useful?

Best Regards

Carl Hasselskog

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list
options: http://lists.jitsi.org/mailman/listinfo/dev

-- https://jitsi.org

-- https://jitsi.org

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

--
https://jitsi.org

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

--
https://jitsi.org


#16

Sorry for all the e-mails today but I think I've found the root cause now!

No worries, that's what this list is for :slight_smile:

I was looking through the logs of one of the users that experienced the bug. I found that this exception was being thrown when the connection was being initialized:

javax.sdp.SdpException The parameter is null
at gov.nist.javax.sdp.SessionDescriptionImpl.setOrigin(S:480)
at javax.sdp.SdpFactory.createSessionDescription(S:79)
at om.degoo.backend.ice4j.SdpUtils.createSDPDescription(S:42)

This was thrown just after the Agent instance was created but before the connection had been added to our list of connections (this list is regularly being purged, to avoid running out of resources). Because of this the Agent instance that was created was never free'd and therefore the client ran out of threads.

However, this doesn't explain way setOrigin is called with a null parameter. Any idea on why that might happen?

This would depend on exactly what happens in om.degoo.backend.ice4j.SdpUtils.createSDPDescription(S:42)

What does the SDP look like? What version of Java are you using and on what OS?

Emil

···

On 01.12.13, 18:50, Carl Hasselskog wrote:

Regards
Carl

-----Original Message-----
From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 30 november 2013 16:45
To: Jitsi Developers
Cc: Carl Hasselskog
Subject: Re: [jitsi-dev] Freeing unselected candidates

On 30.11.13, 16:14, Carl Hasselskog wrote:

On 29.11.13, 18:29, Carl Hasselskog wrote:

I am running approx. 10 concurrent connections.

Doesn't sound like something that should cause an OutOfMem exception.

Note that the OutOfMemoryError is caused by failure to create more
native threads. It's not running out of Java heap space. Creating a
native thread is a pretty heavy operation. The default in Java is to
create a thread stack of somewhere between 320kb - 1024kb (depending
on JVM and OS). This is allocated outside the heap so it doesn't
benefit from the GC's compaction operation after each GC. Combining
lots of created threads and some memory fragmentation and you can
easily run out free blocks that are big enough to hold a thread stack.

Certainly. However "10" is definitely not a number where you would expect such things to happen.

I don't know the exact details of the clients that have run into this
issue. It could be that they are running >10 concurrent connections
and if they also have more local candidates than average you could
easily see that >100 threads are running simultaneously and the client
could have been running for several days. 100 concurrent threads +
lots of fragmentation can cause OutOfMemoryError.

A couple of 100 is still something that I would expect to pass but this is not really the point. The reason I am insisting on this is because the number of threads that would cause an OutOfMemoryError (which in my estimations should be in the thousands at least, but again that's not really the point) is something that I can't possibly imagine ever happening with normal client side operation of ice4j. I've regularly tested ice4j with 5 real and virtual interfaces, both IPv4 and IPv6 and four STUN/TURN servers and I have never seen this (although STUN and TURN shouldn't have an impact on threads).

This makes me think that something in your case must be going wrong in some way and someone must be creating threads out of control.

Another way to reduce this risk even further (and increase Ice4j's
overall performance) would be to make Ice4j use a thread-pool instead
of creating new threads all the time. Even if the thread-pool is
really big (to reduce the risk of starvation) it could reduce the
fragmentation issue, since reusing the threads could reduce the
fragmentation issue (since fewer thread stacks are allocated).

You mean, something like this: http://goo.gl/m6RxzQ ?

Or this: http://goo.gl/4xmaea ?

Note that what you patch does is this: once ice4j is done harvesting candidates and done checking them, which are the most intensive parts of the operation, it simply closes the sockets that are not in use any more.

At the time when you are doing this ice4j should already be in cruising altitude. There is no reason to be adding new threads at that time. Yet, that's when you see your exception (or so I gather) which means that something is still working heavily.

Simply closing unused candidate sockets at that point is more likely just addressing the symptoms rather than the actual cause.

I assume a thread dump would help us proceed less blindly. Do you think you could get one?

Emil

I don't know the exact number because it differs from client to
client. I know about free(). However the way I understand it I can
only call free() after I don't need the PseudoTCP-connection
anymore. The idea behind this method is to close all
Connector-threads for candidates that are not needed once the
PseudoTCP-connection has been established with the selected
candidate.

Right. I neglected that.

Can I call free and the PseudoTCP-connection will still work?

No.

I was under the impression that calling free() closed all
candidates, which would also close the PseudoTCP-connection. Am I
misunderstanding how things work?

No, I wasn't careful enough it seems. I am surprised we are not doing
that but I can see how it could have slipped.

OK then. Regarding the patch:

1. It would be good to know how having 10 open sockets could cause an
out of mem exception. Even if the new method seems to remove this, it
could very well be that it's simply masking it.

2. Could you please make sure it complies with:

https://jitsi.org/codeconvention

Currently lines go beyond 80 and there are no javadocs.

3. It would probably be better to use the same model here as for the
regular free so that things would be consistent. That is: have
streams and components do the freeing themselves. You could probably
reuse the same methods but with an extra param.

Sure I can change that. Just to make sure that I understand you
correctly. You want me to merge the newly created method with free()
and instead only have a boolean parameter that says whether the
selected candidate should be free'd or not?

I personally don't see the point of restricting line width. Any decent
IDE should handle word-wrapping according to the developers preference
anyway. However, I do see the point of being consistent across the
library so I'll change that as well.

4. Why is it that this method only works if there is a selected
candidate?

I don't think I understand this question. What do you mean?

Emil

Regards Carl

-----Original Message----- From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 28 november 2013 12:05 To:
Jitsi Developers Cc: Carl Hasselskog Subject: Re: [jitsi-dev]
Freeing unselected candidates

Hey Carl,

On 27.11.13, 18:01, Carl Hasselskog wrote:

Hi,

We have sometimes experienced problems with Ice4j when we run
multiple concurrent connections. The problem that occurs is that
Java throws an OutOfMemoryErrors saying that it can't create any
more native threads.

Argh ... not cool.

While running a profiler on our code I noticed that there are quite
a few Connector instances running for each connection (with
connection I mean a PseudoTCP connection).

How many are we talking about? Could they have been the reason for
your exception?

I figured that most of these are not really needed once a pair has
been selected and can therefore be freed. NOTE! This assumption is
based on my very limited knowledge of the inner workings of Ice4j.
The tests we have done seem to be working fine but please let me
know if I'm wrong!

I have attached a patch that includes an new method in IceAgent
that runs free() on all LocalCandidates that are not related to the
selected pair (with directly or indirectly via the related
address).

Well ... the free() method in the Agent already does this. Well it
removes streams, that free components, that free candidates.

If you think any of these things are not happening in a normal
free() then we should probably investigate that.

Am I making sense? Emil

Feel free to use this however you like. Any initial thoughts?
Are there edge cases where this code does more harm than good or do
you think it can be useful?

Best Regards

Carl Hasselskog

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list
options: http://lists.jitsi.org/mailman/listinfo/dev

-- https://jitsi.org

-- https://jitsi.org

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

--
https://jitsi.org

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

--
https://jitsi.org


#17

I think I've solved the exception in the SDP code now. I solved it by downloading the latest source code of jain-sip (replacing the Maven dependency) and then modifying javax.sdp.SdpFactory. The code that caused the problem was the following:
try {
   OriginField originImpl = (OriginField) this.createOrigin("user", InetAddress
                    .getLocalHost().getHostAddress());

      sessionDescriptionImpl.setOrigin(originImpl);
        } catch (UnknownHostException e) {
            e.printStackTrace();
        }

I noticed that SdpUtils.createSDPDescription(Agent agent) was overwriting setOrigin after calling SdpFactory.createSessionDescription()so the code above didn't do anything useful. I therefore simply removed it. My guess is that the code was running into this problem: http://stackoverflow.com/questions/1881546/inetaddress-getlocalhost-throws-unknownhostexception.

Also, note that my latest patch (the one that was freeing unselected candidates based on a boolean parameter in free()) doesn't work properly. After only freeing unselected candidates all the streams are removed so if it's called later on in an attempt to free the remaining selected candidates it won't work (since it won't find any streams). I will therefore stick to my original patch, were freeing unselected candidates is being done in a separate method and then one can call the normal free once the PseudoTCP-connection has been closed.

Regards
Carl

···

-----Original Message-----
From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 1 december 2013 22:02
To: Jitsi Developers
Cc: Carl Hasselskog
Subject: Re: [jitsi-dev] Freeing unselected candidates

On 01.12.13, 18:50, Carl Hasselskog wrote:

Sorry for all the e-mails today but I think I've found the root cause now!

No worries, that's what this list is for :slight_smile:

I was looking through the logs of one of the users that experienced the bug. I found that this exception was being thrown when the connection was being initialized:

javax.sdp.SdpException The parameter is null at
gov.nist.javax.sdp.SessionDescriptionImpl.setOrigin(S:480)
at javax.sdp.SdpFactory.createSessionDescription(S:79)
at om.degoo.backend.ice4j.SdpUtils.createSDPDescription(S:42)

This was thrown just after the Agent instance was created but before the connection had been added to our list of connections (this list is regularly being purged, to avoid running out of resources). Because of this the Agent instance that was created was never free'd and therefore the client ran out of threads.

However, this doesn't explain way setOrigin is called with a null parameter. Any idea on why that might happen?

This would depend on exactly what happens in
om.degoo.backend.ice4j.SdpUtils.createSDPDescription(S:42)

What does the SDP look like? What version of Java are you using and on what OS?

Emil

Regards
Carl

-----Original Message-----
From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 30 november 2013 16:45
To: Jitsi Developers
Cc: Carl Hasselskog
Subject: Re: [jitsi-dev] Freeing unselected candidates

On 30.11.13, 16:14, Carl Hasselskog wrote:

On 29.11.13, 18:29, Carl Hasselskog wrote:

I am running approx. 10 concurrent connections.

Doesn't sound like something that should cause an OutOfMem exception.

Note that the OutOfMemoryError is caused by failure to create more
native threads. It's not running out of Java heap space. Creating a
native thread is a pretty heavy operation. The default in Java is to
create a thread stack of somewhere between 320kb - 1024kb (depending
on JVM and OS). This is allocated outside the heap so it doesn't
benefit from the GC's compaction operation after each GC. Combining
lots of created threads and some memory fragmentation and you can
easily run out free blocks that are big enough to hold a thread stack.

Certainly. However "10" is definitely not a number where you would expect such things to happen.

I don't know the exact details of the clients that have run into this
issue. It could be that they are running >10 concurrent connections
and if they also have more local candidates than average you could
easily see that >100 threads are running simultaneously and the
client could have been running for several days. 100 concurrent
threads + lots of fragmentation can cause OutOfMemoryError.

A couple of 100 is still something that I would expect to pass but this is not really the point. The reason I am insisting on this is because the number of threads that would cause an OutOfMemoryError (which in my estimations should be in the thousands at least, but again that's not really the point) is something that I can't possibly imagine ever happening with normal client side operation of ice4j. I've regularly tested ice4j with 5 real and virtual interfaces, both IPv4 and IPv6 and four STUN/TURN servers and I have never seen this (although STUN and TURN shouldn't have an impact on threads).

This makes me think that something in your case must be going wrong in some way and someone must be creating threads out of control.

Another way to reduce this risk even further (and increase Ice4j's
overall performance) would be to make Ice4j use a thread-pool instead
of creating new threads all the time. Even if the thread-pool is
really big (to reduce the risk of starvation) it could reduce the
fragmentation issue, since reusing the threads could reduce the
fragmentation issue (since fewer thread stacks are allocated).

You mean, something like this: http://goo.gl/m6RxzQ ?

Or this: http://goo.gl/4xmaea ?

Note that what you patch does is this: once ice4j is done harvesting candidates and done checking them, which are the most intensive parts of the operation, it simply closes the sockets that are not in use any more.

At the time when you are doing this ice4j should already be in cruising altitude. There is no reason to be adding new threads at that time. Yet, that's when you see your exception (or so I gather) which means that something is still working heavily.

Simply closing unused candidate sockets at that point is more likely just addressing the symptoms rather than the actual cause.

I assume a thread dump would help us proceed less blindly. Do you think you could get one?

Emil

I don't know the exact number because it differs from client to
client. I know about free(). However the way I understand it I can
only call free() after I don't need the PseudoTCP-connection
anymore. The idea behind this method is to close all
Connector-threads for candidates that are not needed once the
PseudoTCP-connection has been established with the selected
candidate.

Right. I neglected that.

Can I call free and the PseudoTCP-connection will still work?

No.

I was under the impression that calling free() closed all
candidates, which would also close the PseudoTCP-connection. Am I
misunderstanding how things work?

No, I wasn't careful enough it seems. I am surprised we are not
doing that but I can see how it could have slipped.

OK then. Regarding the patch:

1. It would be good to know how having 10 open sockets could cause
an out of mem exception. Even if the new method seems to remove
this, it could very well be that it's simply masking it.

2. Could you please make sure it complies with:

https://jitsi.org/codeconvention

Currently lines go beyond 80 and there are no javadocs.

3. It would probably be better to use the same model here as for the
regular free so that things would be consistent. That is: have
streams and components do the freeing themselves. You could probably
reuse the same methods but with an extra param.

Sure I can change that. Just to make sure that I understand you
correctly. You want me to merge the newly created method with free()
and instead only have a boolean parameter that says whether the
selected candidate should be free'd or not?

I personally don't see the point of restricting line width. Any
decent IDE should handle word-wrapping according to the developers
preference anyway. However, I do see the point of being consistent
across the library so I'll change that as well.

4. Why is it that this method only works if there is a selected
candidate?

I don't think I understand this question. What do you mean?

Emil

Regards Carl

-----Original Message----- From: Emil Ivov [mailto:emcho@jitsi.org]
Sent: den 28 november 2013 12:05 To:
Jitsi Developers Cc: Carl Hasselskog Subject: Re: [jitsi-dev]
Freeing unselected candidates

Hey Carl,

On 27.11.13, 18:01, Carl Hasselskog wrote:

Hi,

We have sometimes experienced problems with Ice4j when we run
multiple concurrent connections. The problem that occurs is that
Java throws an OutOfMemoryErrors saying that it can't create any
more native threads.

Argh ... not cool.

While running a profiler on our code I noticed that there are
quite a few Connector instances running for each connection (with
connection I mean a PseudoTCP connection).

How many are we talking about? Could they have been the reason for
your exception?

I figured that most of these are not really needed once a pair has
been selected and can therefore be freed. NOTE! This assumption is
based on my very limited knowledge of the inner workings of Ice4j.
The tests we have done seem to be working fine but please let me
know if I'm wrong!

I have attached a patch that includes an new method in IceAgent
that runs free() on all LocalCandidates that are not related to
the selected pair (with directly or indirectly via the related
address).

Well ... the free() method in the Agent already does this. Well it
removes streams, that free components, that free candidates.

If you think any of these things are not happening in a normal
free() then we should probably investigate that.

Am I making sense? Emil

Feel free to use this however you like. Any initial thoughts?
Are there edge cases where this code does more harm than good or
do you think it can be useful?

Best Regards

Carl Hasselskog

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list
options: http://lists.jitsi.org/mailman/listinfo/dev

-- https://jitsi.org

-- https://jitsi.org

_______________________________________________ dev mailing list
dev@jitsi.org Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

--
https://jitsi.org

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

--
https://jitsi.org