Re: [gst-devel] comfort noise generation bin

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Sun, 2010-04-25 at 17:59 +0300, Marco Ballesio wrote:
> Hi all and especially Farsight developers,
> 
> checking on the Farsight todo list I see something is being cooked
> about CN generation.
> On Farsight sources I can see a basic handling of CN sending, but not
> that much about receiving it.
>
> As it appears CN generation in the receive side is the trickiest part,
> I wanted to know how it's planned to deal with it. In example, for
> g729 packets it's possible to receive only a SID frame and then
> nothing more until the next talkspurt: because of DTX it's not
> possible to give any direct relations between input packets and output
> uncompressed time length. RFC3389 also defines some ways to adjust the
> noise level before the next talkspurt but, again, DTX makes it hard to
> deal with CN by using a traditional GStreamer decoder.
> 
> If nothing is already available, I was thinking about a generic
> support bin to be controlled from the speech codecs or depayloaders.
> The bin structure may be sketched with an audio source generating a
> coloured noise with the pole-only spectral description obtained from
> the silence encoder, connected togehter with the decoder to an input
> selector. The latter would be simply controlled from the depayloader
> (or decoder) when e.g. a SID/talkspurt start has been received.
> 
> Are there any other/better ideas (being) implemented? 

Nothing has been implemented in Farsight2 because I don't know what the
best approach is.

My original idea is that SID frames would be received by a special
depayloader (if audio/CN) or by the decoder (if its a codec like G.729
that has built-in CN). Then these elements would forward the "silence
data" downstream to the mixer which would then generate the correct
comfort noise when it does not have any voice packets. That way, CN can
be only generated if nothing is received (so it won't do strange things
if the other party switches codecs mid call or in a multi-party call).
That said, this measn that the CN is not generated by the decoder but by
the mixer. This is easy to implement for codecs that use the generic RFC
3389 CN packets, but it is probably more tricky to implement for codecs
(like Speex or G.729) that have their comfoirt noise algorithms.

So maybe another solution is needed, like having the decoder generate a
comfort noise buffer when they receive a "GstRTPPacketLost" event from
the jitterbuffer (which should be only sent to the last active payload
type per SSRC). My understanding is that the decoder should only
generate CN after their receive one SID frame until another voice frame
is received. That said, this solution means that in a multi-party call,
one would get CN.

Anyway, your input is welcome as you seem to know quite a bit more about
the actual algorithms than I do.

-- 
Olivier Crête
oli...@co...
Collabora Ltd