I've tried to go on with the debugging of the adapted no-registrar patch.
The problem I'm focused on is to allow both no-registrar and registrar
mode accounts to be created and loaded at the same time. For now the patch doesn't work (always) for both type of accounts loaded simultaneously.
To describe it very shortly, the idea in Michael Koch's older patch was to
"split" the ProtocolProviderServiceSipImpl class into a registrar mode one
and a no-registrar mode one, both derived from a common functionality
AbstractProtocolProviderServiceSipImpl class. The decision on which one
to instantiate for a specific account depends on a property of the
account. This is checked at account loading in
The problem now is that, when a no-registrar account and also a registrar
one are loaded, when receiving a call addressed to the no-registrar
account, the provider processing the request happens to be the one for the
registrar mode account. This doesn't happen in every case (see below) but
it does happen even if the only logged in account is the no-registrar one (actually this is the use case I'm testing now, one registrar mode account and one no-registrar mode only with the no-registrar mode logged in - same configuration for both peers in the test)
I've tried to debug the issue since the weekend repeatedly tracking back
function calls to see where does the problem appear. What I can tell is that the two accounts seem to be loaded correctly and the bundle registration of the specific ProtocolProviderService is done right.
Like I said, when the wrong provider processes the request, it is selected as the call would have been addressed to the registrar mode account, not because the wrong type of service provider was registered for the no-registrar account. The OperationSetBasicTelephonySipImpl from which the processRequest is called in the end corresponds to the registrar account (as the protocol provider service does too). However, the INVITE received seems to be on correct account id (the no-registrar one).
In essence, until now I didn't found anything going wrong before the actual call. From the function call stack, when an INVITE is received, the first function called from the SC sources is the processRequest in the account's should-be-corresponding protocol service provider. I'm not exactly sure how the specific provider is selected to process the request and this was my initial question. Anyway, I decided to go deeper before asking, and I went with the function calls tracking into the JAIN SIP stack (I'm not sure I've got the right version sources but binding the ones I got to the debugger, at least for this part, seems to be ok). According to the function call stack, the last function before the processRequest in the protocol service provider is the deliverEvent one from the EventScanner (actually at this level of tracking back I've checked the INVITE too). What I can see here is that the sip stack used is again the wrong one, so the cause of the faulty provider selection seems to be even deeper. I'm not sure if I'm correct about this debugging direction, so I've stopped here (also because at least for the specific thread the function call stack doesn't go much deeper - starts with Thread.run() , EventScanner.run() so deeper tracing back would probably mean to search in the JAIN SIP stack sources for the location where the EventScanner is instantiated).
The only thing that's clear for now (if I may say so...) is that the wrong protocol provider selection for handling a call isn't a permanent behavior. I repeatedly deleted and created test accounts for the specified use case configuration until all went fine. So, there are cases of accounts that seem "to support" the patch, and cases of accounts which doesn't support the patch. I didn't observe any difference during stepping through the code (at least first) so I finally decided to take a "incorrectly behaving" case sip-communicator.xml and make changes by copy pasting the info from a "correctly behaving" one until I can find what's the difference which causes all the problem. As weird or stupid it might sound is the registrar mode account's random generated (I think..) tag value. I reproduce two cases next :
- correct behaviour (right protocol provider service selection - modified xml):
<acc1215995650593 value="acc1215995650593"> -> for registrar mode account
<acc1216123156015 value="acc1216123156015"> -> for no-registrar mode account
- incorrect behaviour (wrong protocol provider service selection - original xml):
<acc1216123126656 value="acc1216123126656"> -> for registrar mode account
<acc1216123156015 value="acc1216123156015"> -> for no-registrar mode account (same as above)
( in the source for the modified xml - correct behavior case, the tag was taken from another correct behaviour one which looked like this:
<acc1215995650593 value="acc1215995650593"> -> registrar
<acc1216029733750 value="acc1216029733750"> -> no-registrar )
Well, after this "discovery" I traced again the code with more attention on the account loading. There is a difference between the "correct" and the "incorrect" case: in the correct case the no-registrar account is loaded first, in the incorrect case the registrar account is loaded first. I can't find any explanation directly related with the values above because in both the cases the registrar one is lower, but this is after all irrelevant due to possible order difference change after some hashing (not sure though). Anyway, this account loading order is the only difference I found like I said, and doesn't seem to cause any other problem. Both types of service providers are registered. I even tried to "reverse" the use case for the correct behaviour xml version for which the no-registrar account is used first. I tested with the registrar accounts logged on, the no-registrar logged off, to see if the selected provider isn't wrong this time in the opposite way - selecting only the no-registrar provider. The call went fine. Off course it's not really complete opposite conditions because this time the registrar SIP server was started and used.
These are, I think, the most important facts observed. In the end I might say that I'm pretty puzzled after few days of debugging. My opinion is that it might be after all something related with the account loading order, or if not (and this is worse) something deeper inside the JAIN SIP stack. There is also a third posibility (which probably I should have considered from start...) that some parts of the adapted code indirectly may cause this due to various other changes between the original sources that were patched and the ones from now. I still didn't review one more time all the parts I've adapted, so it's possible to have missed something. I'll eventually compare an older trunk revision I've checked out, which was the subject of the original patch, with the current sources to see if I can get a solution. However, the "root" of the bug, if I may say so, is after all the way the wrong service provider gets selected instead of the right one. From the debugging described I can't figure out how, or were in the code this happens. (I really doubt that it is in the rest of patch, but like I said any not enough reviewed sections or other older-newer source differences might trigger it indirectly). I'm really unexperienced with the way the JAIN SIP stack works ( and also many other parts discussed above ), so any opinion about where/how this protocol provider service/wrong sip stack selection happens, or about any of the facts described is more than welcome.
PS: In the latest patch submitted the instantiation of registrar mode providers was disabled inside the code. To get it running (if someone wants to test the faulty patch) just modify inside the ProtocolProviderFactorySipImpl the line:
"Life is full of unexpected but nothing happens without a reason..."
To unsubscribe, e-mail: email@example.com
For additional commands, e-mail: firstname.lastname@example.org