[sip-comm-dev] GSoC Progress - Sound Level Indicator


#1

Hi all...

This is a brief report of the progress that I have made in my project which
is "Implementing a Sound Level Indicator for SC".

*I successfully retrieved the relevant data sources *
Some minor modifications was done to the existing code in order to push
audio data to my newly added audio processing classes.

*Processed the audio data in order to get the required sound level details*
A simple RMS value calculation was done to get the sound level details.

*Showed the calculated values in a simple UI*
Added a small UI which continuously shows the sound levels of the
participants in a call. Need to add this UI to the existing Call Dialog Box
later.

I have already committed the modified code to
https://sip-communicator.dev.java.net/svn/sip-communicator/branches/gsoc09/sli

I Need to improve the accuracy and the performance of the sound level
details calculating methods to have a smooth fluctuation.
At the moment the implementation only supports a call between two
participants. Now I'm going to modify the code,so that it could be easily
customizable to a conference as well.
Writing unit test cases will also follow.

I'll keep updating you about the project as it goes.

Regards
Dilshan


#2

Hello Dilshan,

Congratulations on your progress! It felt nice to actually be able to
run the application and see the sound indicators in a real-world call.

Emil, Yana and I had a chance to test the latest version of the
sound-level indicators, go through your source code and talk about our
impressions and observations. I'll try to share them here with you but
because we touched on multiple subjects and I'm not sure I can
describe them all in one go, I'll split the topics in multiple
e-mails.

I'd like to first start with my thoughts on the DataSources and
SourceStreams because these observations are mine and thus I can
explain them better.

I'm concerned about the PushBufferStream#read(Buffer) you're using on
the SendStreams and ReceiveStreams (by obtaining their associated
DataSources and then using the SourceStreams of #getStreams()). I had
a look at some of the existing PushBufferStream implementations and
it's my understanding that reading from a SourceStream pops the read
data out of them i.e. executing PushBufferStream#read(Buffer) twice in
a row will read different data. The problem I see with this is that
the SendStreams are actually read by the functionality which sends
them to the CallParticipant and the ReceiveStreams are read by the
functionality which renders them. Doesn't it mean then that since
you're also reading from them, you'll be "stealing" data i.e. when you
read from a SendStream for the purposes of the sound-level indicators,
the data you've read will not be available for the functionality which
sends data to the CallParticipant and consequently the CallParticipant
will not receive the data used for the sound-level indicators?

Another thing I wasn't able to see in the code is making sure that
only audio streams are being used for the sound-level indicators. Were
you able to test your modifications with calls employing both audio
and video?

And finally (in this mail), I guess the case of conferencing may
require changing the source of the data for the local sound-level
indicator. You're currently reading it from one of the SendStreams but
in conferencing one will not only have a multitude of SendStreams but
one of the SendStreams that you've chosen for localDataSource may not
be available throughout the whole conference session (e.g. if you
choose the SendStream for a specific CallParticipant to be the
localDataSource, the CallParticipant may later on quit the conference
while the other CallParticipants are still present and then his
SendStream will no longer provide data while actual audio data is
still being sent to the other CallParticipants).

I'll be glad to read your thoughts, answers and corrections on the
issues I've brought up above.

Best regards,
Lubomir

···

2009/6/18 Dilshan Kanchana <dilshanamadoru@gmail.com>:

Hi all...

This is a brief report of the progress that I have made in my project which
is "Implementing a Sound Level Indicator for SC".

I successfully retrieved the relevant data sources
Some minor modifications was done to the existing code in order to push
audio data to my newly added audio processing classes.

Processed the audio data in order to get the required sound level details
A simple RMS value calculation was done to get the sound level details.

Showed the calculated values in a simple UI
Added a small UI which continuously shows the sound levels of the
participants in a call. Need to add this UI to the existing Call Dialog Box
later.

I have already committed the modified code to
https://sip-communicator.dev.java.net/svn/sip-communicator/branches/gsoc09/sli

I Need to improve the accuracy and the performance of the sound level
details calculating methods to have a smooth fluctuation.
At the moment the implementation only supports a call between two
participants. Now I'm going to modify the code,so that it could be easily
customizable to a conference as well.
Writing unit test cases will also follow.

I'll keep updating you about the project as it goes.

Regards
Dilshan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#3

Hi !

Congratulations on your progress! It felt nice to actually be able to

run the application and see the sound indicators in a real-world call.

Thank you so much for the very quick response. It is a pleasure to be
working with you all.

I'm concerned about the PushBufferStream#read(Buffer) you're using on
the SendStreams and ReceiveStreams (by obtaining their associated
DataSources and then using the SourceStreams of #getStreams()). I had
a look at some of the existing PushBufferStream implementations and
it's my understanding that reading from a SourceStream pops the read
data out of them i.e. executing PushBufferStream#read(Buffer) twice in
a row will read different data. The problem I see with this is that
the SendStreams are actually read by the functionality which sends
them to the CallParticipant and the ReceiveStreams are read by the
functionality which renders them. Doesn't it mean then that since
you're also reading from them, you'll be "stealing" data i.e. when you
read from a SendStream for the purposes of the sound-level indicators,
the data you've read will not be available for the functionality which
sends data to the CallParticipant and consequently the CallParticipant
will not receive the data used for the sound-level indicators?

Even though I have processed both the SendStream and the RecieveStream my
concern is only on the SendStream. I processed the RecieveStream just to
test my work and that part will be removed soon. As my Project
Requirements<http://www.sip-communicator.org/index.php/GSOC2009/SoundLevelIndicator>suggests
I will only have to process the SendStream and retrieve sound level
information from it. Then I will be sending the calculated sound level
information in RTP packets and the recievers will grab that and will show it
in their UI. While recieving the sound level information the recievers will
also send their own sound level through RTP packets to the others. So,
eventually everyone will get the sound level information from all others in
the conference.

I thought the CallSessionImpl class would be the best place to access the
SendStream and get data for processing to retrieve sound level information.
I really didn't knew that I was stealing data. So can you please suggest me
a way to retrieve and process audio data in the SendStream without
disturbing other functionalities? It would be ideal if I could access these
data just before it is released to the network through RTP. Then I will be
able to set the sound level attribute in each and every RTP packet that is
leaving.

Another thing I wasn't able to see in the code is making sure that

only audio streams are being used for the sound-level indicators. Were
you able to test your modifications with calls employing both audio
and video?

Nope. I will take necessary actions to make sure I only process audio
streams.

And finally (in this mail), I guess the case of conferencing may
require changing the source of the data for the local sound-level
indicator. You're currently reading it from one of the SendStreams but
in conferencing one will not only have a multitude of SendStreams but
one of the SendStreams that you've chosen for localDataSource may not
be available throughout the whole conference session (e.g. if you
choose the SendStream for a specific CallParticipant to be the
localDataSource, the CallParticipant may later on quit the conference
while the other CallParticipants are still present and then his
SendStream will no longer provide data while actual audio data is
still being sent to the other CallParticipants).

I think this problem will not occur when we do the implementation as I have
stated above. Even when one participant leaves the others will be able to
continue the conversation fine with that implemetation.

Think I answered all your questions clearly. Please correct me if I have got
something wrong in here. Hope to hear from you soon again. Thank you so
much.

Regards
Dilshan


#4

Hi again Dilshan,

As promised, I'm continuing the summary of our conversations with Emil
and Yana on the subject of the sound-level indicators. I'd like to
talk about the separation of the currently monolithic sound-level
indicator architecture into distinct roles i.e. break what you've
currently provided into the media bundle into pieces in the generic
protocol bundle, in the sip protocol bundle, in the gui bundle, in the
media bundle.

Obviously, the JDialog with the sound-level indicators that we've been
able to enjoy during our tests will have to move in the gui bundle.
Equally obvious is that we'll have to place the indicators not in a
separate JDialog but next to each call participant in the call dialog
but that's not really one of the topics of this e-mail.

Because the UI will reside in the gui bundle, it will need a way to
get notified that the sound levels have changed and be told what their
values are for the specific participants. This naturally calls for a
sound-level change listener associated with a CallParticipant. The
question whether one needs to at all listen to the sound level of a
CallParticipant and not listen the sound level of another
CallParticipant in the same Call suggest that it may be more simple
and closer to the actual use case to have a listener on a Call and the
listener gets notification sould-level events about all
CallParticipants in the Call with the event describing the
CallParticipant and her sound level. Moreover, the UI will have to
know when to show and hide the indicators because the sound-level
indicator support will not only be supported for specific protocols
but in the SIP protocol, for example, a conference call may not at all
have the sound level information about the participants and a call may
switch between an ordinary/classic/non-conference call and a
conference call throughout its duration. To keep things simple and
easily comprehensible, we think the Call listener which gets notified
about changes in the sound levels of the various CallParticipants in
the Call may also get the notifications about the changes of the
availability of the sound-level support in the Call. The above brings
us to the following (which uses prototypical names I chose to clearly
describe the method roles):

interface SoundLevelChangeListener
    method soundLevelChanged(CallParticipant, <sound level>);
    method soundLevelSupportAvailabilityChanged(boolean available);

method Call#addSoundLevelChangeListener(SoundLevelChangeListener);

Because the discovery of the changes in the sound levels will occur in
the media and because the current design will force us to place the
discovery of the sound-level support availability there, relatively
the same design will have to be applied in the media bundle as well.
The difference here is that the media bundle doesn't/shouldn't have
knowledge about the mapping between the send and receive streams and
the CallParticipants so its SoundLevelChangeListener will have to
receive notification events not about CallParticipants but about an
identifier allowing the receiver to determine the respective
CallParticipant.

With the above setup, the sound-level UI for a given SIP Call will
install a SoundLevelChangeListener (for protocol) on the Call and
reflect the changes in the sound levels as reported to the listener,
the SIP functionality will install a SoundLevelChangeListener (for
media) on the respective media i.e. CallSession implementation and
when the media fires an event on the media SoundLevelChangeListener,
the latter will translate the reported identifier to a CallParticipant
and fire an event on the protocol SoundLevelChangeListeners.

I hope I'm not missing anything important here but this e-mail has
become so long that I fear that adding more may render the whole thing
incomprehensible. Please feel free to share your thoughts on the
design we propose, ask us questions.

Best regards,
Lubomir

···

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#5

Hey Dilshan

Dilshan Kanchana wrote:

Thank you so much for the very quick response. It is a pleasure to be
working with you all.

Thanks for the kind words. It's definitely a pleasure to be working with
you all too!

Even though I have processed both the SendStream and the RecieveStream
my concern is only on the SendStream. I processed the RecieveStream just
to test my work and that part will be removed soon. As my Project
Requirements
<http://www.sip-communicator.org/index.php/GSOC2009/SoundLevelIndicator>
suggests I will only have to process the SendStream and retrieve sound
level information from it. Then I will be sending the calculated sound
level information in RTP packets and the recievers will grab that and
will show it in their UI. While recieving the sound level information
the recievers will also send their own sound level through RTP packets
to the others. So, eventually everyone will get the sound level
information from all others in the conference.

I am afraid that's not quite accurate and your project requirements
don't really mention this. I am sorry if there's been a confusion.
Determining sound level on incoming streams is actually even more
important than determining that of the outgoing data.

We have never really considered sending sound level in a 1-to-1 session.
There are many reasons for this the most important being that it is
unnecessary as the remote party can very well determine it from the
stream itself. Relying on explicit indications in the packets would also
make the solution incompatible with applications that don't support it.

Now, what _has_ been considered (and I guess that's what must have
caused the confusion) was allowing a conference call mixer to send the
sound level of the individual contributors in a mixed stream since
there's no way for the other participants to calculate it by themselves.
Note however that even in this case the mixer won't be measuring the
level of the send stream but those of all the incoming streams before it
merged them.

Anyways, we are not there yet and for the time being you should
concentrate on receive stream measurement and UI.

I thought the CallSessionImpl class would be the best place to access
the SendStream and get data for processing to retrieve sound level
information. I really didn't knew that I was stealing data. So can you
please suggest me a way to retrieve and process audio data in the
SendStream without disturbing other functionalities? It would be ideal
if I could access these data just before it is released to the network
through RTP.

You can try adding your code in the effect chain. I believe there was a
snippet describing a way to do this here:

http://java.sun.com/javase/technologies/desktop/media/jmf/2.1.1/solutions/

Then I will be able to set the sound level attribute in
each and every RTP packet that is leaving.

Again, this would only be happening in the mixer ... at lest in the
beginning. We may one day add such attributes for non-mixing clients as
well in order to allow a mixer to determine activity without having to
decode but this is not currently planned.

Hope this clarifies the situation. Let me know if otherwise.

Cheers
Emil

···

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#6

That is to say, with about 100% of the SIP clients out there :wink:

···

On Fri, Jun 19, 2009 at 12:50:10PM +0200, Emil Ivov wrote:

Relying on explicit indications in the packets would also make the
solution incompatible with applications that don't support it.

--
Sébastien Mazy

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#7

Hi All !

First of all please forgive me for being a bit late to reply. I was talking
a lot with Sebastien and with his support I was able to create an class
diagram for the proposed design. Check it
here<http://img3.imageshack.us/img3/1287/classdiagramu.jpg>
.

In the following I will try to explain the design and the chain of events in
my own words.

   - In the CallSessionImpl class it would create SoundLevelCalculator
   classes for each of the available audio streams.

   - SoundLevelCalculators will calculate the sound level of its stream and
   will fire StreamSoundLevelEvents.

   - CallSessionImpl class will be notified about these events.

   - In the meantime the CallSipImpl will also get registered to recieve
   these StreamSoundLevelEvents through the CallSession interface.

   - Therefore when CallSessionImpl fire StreamSoundLevel events the
   CallSipImpl will be notified.

   - This is how the sound level details comes from the media bundle to the
   protocol bundle.

   - In the CallSipImpl class it will resolve the events it recieved
   (Finding the relavant CallParticipant from the streamIdentifier it recieves)

   - After that it will fire ParticipantSoundLevelEvents.

   - The SoundLevelIndicator (UI class) which listenes to these
   ParticipantSoundLevelEvents will then be notified and it will render the
   appropriate sound levels on screen.

This was brief explanation about the proposed design. Please point me if you
find any mistake or an unclear thing in the design...

Thank you so much for the support..

Cheers
Dilshan


#8

Hi Dilshan, all,

First of all please forgive me for being a bit late to reply. I was
talking a lot with Sebastien and with his support I was able to create
an class diagram for the proposed design. Check it
here <http://img3.imageshack.us/img3/1287/classdiagramu.jpg>

I'd suggest a few more modifications:
- SoundLevelIndicator isn't an event source, so it doesn't need
   addListener methods
- CallSessionImpl should be a StreamSoundListener. It registers to all
   of its SoundLevelCalculator (one per stream) and can act as an "event
   hub", relaying events to CallSipImpl
- therefore SoundLevelCalculator should have add- and removeListener
   methods.
- in Call and CallSipImpl, replace add- and removeStreamSoundListener
   by add- and removeParticipantSoundListener

You'll probably change the design while coding, so don't spend too much
time on this diagram, even if it's a nice to show what you intend to do.

   - In the CallSipImpl class it will resolve the events it recieved
   (Finding the relavant CallParticipant from the streamIdentifier it
   recieves)

As a side note, 0 for the audio stream to send and 1 for the received
stream (1-to-1 media session) were chosen as arbitrary stream IDs for
the time being.

For the record, passing information up to the UI will only be the next
part of Dilshan's work. The current step is to retrieve sound level
without mangling data, as pointed out by Lubomir. And as Emil mentioned,
a way to do so is to create a JMF/FMJ Effect. There is a good example,
where the instant sound amplitude is already calculated. See GainEffect
here:
http://java.sun.com/javase/technologies/desktop/media/jmf/2.1.1/guide/JMFExtending.html

[By the way, I'm very suspicious about the way a signed, little-endian
16-bit PCM sample is converted to Java's signed int in the GainEffect
example from above.
Instead of: int sample = tempH | (tempL & 255);
I would have written: int sample = tempH << 8 | (tempL & 255);
I'd be grateful if someone could show where/whether I'm wrong]

See also: http://wiki.multimedia.cx/index.php?title=PCM

Once the instant amplitude will have been retrieved in
SoundLevelCalculator, we'll have to decide how often we send a
StreamSoundLevelEvent and how we calculate the average (moving average,
do we want smooth transitions in the indicator?)

Cheers,

···

On Mon, Jun 22, 2009 at 01:33:00AM +0530, Dilshan Kanchana wrote:

--
Sébastien Mazy


#9

Hi all !

This is to give all of you a brief update on my GSoC progress on project
"Implementing a Sound Level Indicator".

With the help from Sebastien I was able to retrieve the sound level details
from the incoming audio stream without stealing any data.

The work was done in the following way.

* Implemented a new JMF Effect plugin called SoundLevelCalculator to process
the audio stream and get sound level details
* Get the incoming audio stream from the
CallSessionImpl.update(ReceiveStreamEvent evt) method
* Create a processor from that incoming stream and process it using the
Effect created
* Show the calculated values in a temporary simple UI

I'm happy to say that this part is working perfectly well and you could get
the latest source code from
https://sip-communicator.dev.java.net/svn/sip-communicator/branches/gsoc09/sli

The next step is to implement the same functionality on outgoing stream as
well.

Can somebody suggest me a place to get the unprocessed outgoing audio
stream? I tried with the stream in CallSessionImpl.update(SendStreamEvent
evt)method, but the stream in there doesn't allow me to create a processor
with its data source. I think it is because it is a RTP data stream.

Can somebody please show me a way to fix this issue?

I need to quickly finish this part and start working on sending the
calculated data in the media bundle to the gui bundle through events and
listeners..

Thanx

Dilshan.