[sip-comm-dev] NPE in MclStorageManager


#1

Calling Emil (again)...

I've become a bit obsessed with your Gibberish slick which logs an NPE and yet completes successfully. It keeps catching me out because I watch the emerging log and think "oh no! I've broken something!", only to check the test reports and see that every test was successful.

Its a bit like the little boy crying wolf... if tests log failures and still work... and other tests are turned off because they don't work... then how can we trust our tests?

Please don't think I'm criticising. I know you agree with me but you can't afford to be too perfectionist at this stage of the project. However, if you can help me out with the high-level design, I'll try to take care of some of the details.

Your failing-but-successful test is exposing a bug in MclStorageManager.createProtoContactGroupNode(), but surprisingly this bug is not exposed in the ContactList slick.

protoGroup.getParentContactGroup() is legitimately returning null because the given group is the root element, but the caller assumes it will always receive an valid instance.

So.... let me have your first impressions and I'll try to sort it out myself quickly.

1. It seems reasonable to return a null parent when the group is the root, so I currently believe the method comment block. Do you agree this is topologically correct according to the original design?

2. If so, then the calling logic should test for a null parent before trying to invoke an instance method on null. That needs recoding defensively.

3. Does it seem reasonable that the ContactList slick should test this boundary case? If yes, then we need a new test here, which must also fail, before the bug is fixed.

4. The Gibberish test should not be returning success when it is NOT deliberately catching the NPE and claiming "expected behaviour". I need to make this test report a failure - then it will be successful after the bug is fixed.

Thanks for your time...

Brian

···

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#2

Hi, Emil. The good news is that I have fixed this bug. A successful test run no longer logs a fat stack trace from the Gibberish slick.

Would you please try to find time soon to verify my fix before I commit it?

There are several reasons why I haven't risked making the change to the repository yet:-

1. I am 99% sure this will also fix the NPE's logged by the jabber, msn and yahoo slicks, but I do no have test accounts and so cannot verify they run clean from my system. I do not want cruise control to be my first proper test!

2. I tried, but failed to write a new test in the contactlist suite. This is the appropriate place, because these tests are run directly against MclStorageManager (rather than letting felix drive the logic asynchronously as it fires events). However, I could not devise a new test that would create the correct starting conditions and not cause collateral damage to the other tests in the suite.

3. I am not very happy with MclStorageManager.metaContactGroupModified(). Your block comment explains why the ADD and REMOVE cases are handled in exactly the same way, but it is the re-add of the removed ContactGroup that does the damage when doing a remove. I tried several changes at this logic level, but they all caused other tests to fail.

In the end, I resorted to the defensive logic you will find in my "successful" change. I am unhappy about providing a parent UID of "unknown", but surprisingly it does no harm. On the other hand(s), the original logic leads to an NPE (parent does not exist) - and not setting the string at all breaks other tests in weird ways.

As you can tell, I feel I have failed by not understanding the logic well enough to create a new test case and fix the bug "properly". However, I can't spare any more time AND it seems to fix the bug without breaking anything else.

I await your views with great interest!

Brian

briansChange (925 Bytes)


#3

Hey Brian,

I was thinking that we should rather try and actually remove the proto
group in

1187:
MetaContactListServiceImpl.removeContactGroupFromMetaContactGroup()

You can try this out if you want or otherwise I'll do it myself when I
get back home and let you know what happens.

Cheers
Emil

Brian Burch написа:

···

Hi, Emil. The good news is that I have fixed this bug. A successful test
           run no longer logs a fat stack trace from the Gibberish slick.

Would you please try to find time soon to verify my fix before I commit it?

There are several reasons why I haven't risked making the change to the
repository yet:-

1. I am 99% sure this will also fix the NPE's logged by the jabber, msn
and yahoo slicks, but I do no have test accounts and so cannot verify
they run clean from my system. I do not want cruise control to be my
first proper test!

2. I tried, but failed to write a new test in the contactlist suite.
This is the appropriate place, because these tests are run directly
against MclStorageManager (rather than letting felix drive the logic
asynchronously as it fires events). However, I could not devise a new
test that would create the correct starting conditions and not cause
collateral damage to the other tests in the suite.

3. I am not very happy with
MclStorageManager.metaContactGroupModified(). Your block comment
explains why the ADD and REMOVE cases are handled in exactly the same
way, but it is the re-add of the removed ContactGroup that does the
damage when doing a remove. I tried several changes at this logic level,
but they all caused other tests to fail.

In the end, I resorted to the defensive logic you will find in my
"successful" change. I am unhappy about providing a parent UID of
"unknown", but surprisingly it does no harm. On the other hand(s), the
original logic leads to an NPE (parent does not exist) - and not setting
the string at all breaks other tests in weird ways.

As you can tell, I feel I have failed by not understanding the logic
well enough to create a new test case and fix the bug "properly".
However, I can't spare any more time AND it seems to fix the bug without
breaking anything else.

I await your views with great interest!

Brian

------------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#4

Emil Ivov wrote:

I was thinking that we should rather try and actually remove the proto
group in

1187:
MetaContactListServiceImpl.removeContactGroupFromMetaContactGroup()

You can try this out if you want or otherwise I'll do it myself when I
get back home and let you know what happens.

Well, that is a direction I hadn't explored. Superficially, it looks as if it was written subsequently and you didn't go back to use it.

But, what about your comment in metaContactGroupModified() :

//the fact that a contact group was added or removed to a
//meta group may imply substantial changes in the child contacts
//and the layout of any possible subgroups, so
//to make things simple, we'll remove the existing meta contact
//group node and re-create it according to its current state.

I was treating this with the respect I normally reserve for signs that say things like "Danger! High Voltage!". Do you think that you didn't trust your own judgement when developing that method? Perhaps you had problems because other stuff wasn't working properly?

If I have time, I'll try it tomorrow and let you know. If not, please go ahead because you'll be quicker than me.

Can you give me any guidance about writing an extra unit test that covers this situation? What annoyed me most was my inability to crib an appropriate code section to set up and expose this bug. I've now coded a fix, but the ultimate test should NOT be the Gibberish slick - it should be in the contact list slick. In principle, we should have a test that fails NOW and works with my proposed fix... and would hopefully also work with the fix you've outlined above.

Thanks for responding (did I get your private email address right or did you pick this up from the mailing list?)

Brian

···

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#5

Hey Brian,

Brian Burch написа:

Emil Ivov wrote:

I was thinking that we should rather try and actually remove the proto
group in

1187:
MetaContactListServiceImpl.removeContactGroupFromMetaContactGroup()

You can try this out if you want or otherwise I'll do it myself when I
get back home and let you know what happens.

Well, that is a direction I hadn't explored. Superficially, it looks as
if it was written subsequently and you didn't go back to use it.

But, what about your comment in metaContactGroupModified() :

//the fact that a contact group was added or removed to a
//meta group may imply substantial changes in the child contacts
//and the layout of any possible subgroups, so
//to make things simple, we'll remove the existing meta contact
//group node and re-create it according to its current state.

I was treating this with the respect I normally reserve for signs that
say things like "Danger! High Voltage!". Do you think that you didn't
trust your own judgement when developing that method? Perhaps you had
problems because other stuff wasn't working properly?

Not sure I understand the question. The comment means that when we see
that a contact group is removed from a meta contact group we simply
remove the whole meta contact group from the XML and then recreate it.
We do this because the alternative would imply going through all
subgroups and remove, one by one, all the children (contacts and groups)
of the group that has been removed.

Can you give me any guidance about writing an extra unit test that
covers this situation? What annoyed me most was my inability to crib an
appropriate code section to set up and expose this bug. I've now coded a
fix, but the ultimate test should NOT be the Gibberish slick - it should
be in the contact list slick. In principle, we should have a test that
fails NOW and works with my proposed fix... and would hopefully also
work with the fix you've outlined above.

Well first, let's try and clear out the scenario that is causing this.
It happens when you try to remove a provider for which there is at least
one group. The meta contact list sees that the provider is about to be
removed so it considers that all its contacts and groups should now be
removed.

This is happens in:

MetaContactListerviceImpl.handleProviderRemoved()

At the end of handleProviderRemoved() we try to remove the root group,
and after we do, we fire an event saying that it has just been removed.
The MclStorageManager sees this and thinks "Oh, we lost a contact group,
I will now recreate the XML for the encapsulating metagroup". This is
where the exception happens.

As of now I am not entirely entirely sure why this is happening. Feel
free to have a look if interested. Don't worry if you don't have the
time since this isn't all that urgent.

Thanks for responding (did I get your private email address right or did
you pick this up from the mailing list?)

Yes I did get your prompt mail but I would've replied to the list anyway :wink:

Cheers
Emil

···

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#6

Emil Ivov wrote:
<snip>

As of now I am not entirely entirely sure why this is happening. Feel
free to have a look if interested. Don't worry if you don't have the
time since this isn't all that urgent.

<snip>

OK, Emil. Thanks for your explanation. I now realise I need to spend a lot of uninterrupted time to a) understand the MCL logic, and b) fix this bug properly.

What about this suggestion?

a) it is a bad idea to have a CC build that spits out 4 or 5 nasty stack traces AND THEN says "all tests are successful".

b) put in my fix as a temporary solution... as long as I am only adding a parent UID of "unknown" to new root entries, what is the harm? After all, the tests still run successfully and many (at least half and possibly all) cases will be where the new root entry is going to be removed soon after.

c) I know my solution is quick and dirty. I've tried hard and yet found it difficult to understand this area of code. I believe we need to fix this bug properly, so I would add a TODO to the change, as well as the code that calls it. You understand the circumstances better than me, so you could open a new issue so it doesn't get forgotten.

What do you think? I want to clean this up fast and move on, but I am worried about unforeseen side-effects. Do you think it is worth taking a punt and backing out the change if users complain?

Brian

···

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#7

Hey Brian,

Brian Burch написа:

a) it is a bad idea to have a CC build that spits out 4 or 5 nasty stack
traces AND THEN says "all tests are successful".

It does look a bit annoying. However since it is not really causing any
undesired behaviour that users would be aware of then it's rather a good
thing that it is not breaking the build. Wouldn't you agree?

b) put in my fix as a temporary solution... as long as I am only adding
a parent UID of "unknown" to new root entries, what is the harm? After
all,

The only "harm" I see is that we would not be reminded about it so the
bug may well fall into oblivion. So unless there's some specific problem
that this is causing and that would require immediate intervention,
wouldn't it be better to keep the stack traces as a reminder?

the tests still run successfully and many (at least half and
possibly all) cases will be where the new root entry is going to be
removed soon after.

As a matter of fact (in case you decide to go back to debugging the list
at some point) I did some tests and the group is actually supposed to be
removed. The exception is probably caused by one of the remaining
protocol root groups. I saw that by default we are not writing them in
the contactlist.xml so it might be a better fix to simply ignore the
call if this is a root group, rather than creating an entry for it. Not
sure though as it's been a while since I last worked on the storage manager.

c) I know my solution is quick and dirty. I've tried hard and yet found
it difficult to understand this area of code. I believe we need to fix
this bug properly, so I would add a TODO to the change, as well as the
code that calls it. You understand the circumstances better than me, so
you could open a new issue so it doesn't get forgotten.

My problem with this fix is that it only hides a potential bug without
really resolving any problems. If you can't live with the stack trace in
the CC mails then please go ahead and commit your patch but you'd then
have to promise that you'll come back to the actual bug some time later.
;). As for opening an issue feel free to do so and simply enter the NPE
in the description as well as a pointer to the archives containing this
thread.

What do you say?

Cheers
Emil

···

What do you think? I want to clean this up fast and move on, but I am
worried about unforeseen side-effects. Do you think it is worth taking a
punt and backing out the change if users complain?

Brian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#8

Emil Ivov wrote:

Hey Brian,

Brian Burch написа:

a) it is a bad idea to have a CC build that spits out 4 or 5 nasty stack traces AND THEN says "all tests are successful".

It does look a bit annoying. However since it is not really causing any
undesired behaviour that users would be aware of then it's rather a good
thing that it is not breaking the build. Wouldn't you agree?

Weeeeell... not really. The stack traces show the tests are NOT successful - they are wrong. Nevertheless, wrong tests do not necessarily mean the release is bad.

b) put in my fix as a temporary solution... as long as I am only adding a parent UID of "unknown" to new root entries, what is the harm? After all,

The only "harm" I see is that we would not be reminded about it so the
bug may well fall into oblivion. So unless there's some specific problem
that this is causing and that would require immediate intervention,
wouldn't it be better to keep the stack traces as a reminder?

Reluctantly, I agree with your pragmatic opinion. However, I've noticed (without paying attention to) other contact list problems and I wonder whether our base is as solid as it appears to be on the surface.

the tests still run successfully and many (at least half and possibly all) cases will be where the new root entry is going to be removed soon after.

As a matter of fact (in case you decide to go back to debugging the list
at some point) I did some tests and the group is actually supposed to be
removed. The exception is probably caused by one of the remaining
protocol root groups. I saw that by default we are not writing them in
the contactlist.xml so it might be a better fix to simply ignore the
call if this is a root group, rather than creating an entry for it. Not
sure though as it's been a while since I last worked on the storage manager.

OK, I will not go ahead with my change.

Your observations provide me with useful new information. I was actually worried about a side effect to my change of NOT synchronising the external xml file, and you noticed it wasn't synchronised in this case anyway!

I'll try to find some time to investigate further. I think I will only feel happier if I write some new contactlist tests, so it might take me a while to get much further forward - I'll come at it sideways.

By the way, my ~/sandboxJavaNet/trunk/lib/testing.properties has the definition:
net.java.sip.communicator.CONTACTLIST_FILE_NAME=testing.contactlist.xml

but.... find . -name "*contactlist.xml" only detects:
~/.sip-communicator/contactlist.xml

Where is your testing.contactlist.xml being stored???

I hope you don't mind me asking further questions as I go along.

Have a good weekend!

Brian

···

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net