From: Olivier C. <oli...@co...> - 2010-04-26 01:46:09
|
On Sun, 2010-04-25 at 17:59 +0300, Marco Ballesio wrote: > Hi all and especially Farsight developers, > > checking on the Farsight todo list I see something is being cooked > about CN generation. > On Farsight sources I can see a basic handling of CN sending, but not > that much about receiving it. > > As it appears CN generation in the receive side is the trickiest part, > I wanted to know how it's planned to deal with it. In example, for > g729 packets it's possible to receive only a SID frame and then > nothing more until the next talkspurt: because of DTX it's not > possible to give any direct relations between input packets and output > uncompressed time length. RFC3389 also defines some ways to adjust the > noise level before the next talkspurt but, again, DTX makes it hard to > deal with CN by using a traditional GStreamer decoder. > > If nothing is already available, I was thinking about a generic > support bin to be controlled from the speech codecs or depayloaders. > The bin structure may be sketched with an audio source generating a > coloured noise with the pole-only spectral description obtained from > the silence encoder, connected togehter with the decoder to an input > selector. The latter would be simply controlled from the depayloader > (or decoder) when e.g. a SID/talkspurt start has been received. > > Are there any other/better ideas (being) implemented? Nothing has been implemented in Farsight2 because I don't know what the best approach is. My original idea is that SID frames would be received by a special depayloader (if audio/CN) or by the decoder (if its a codec like G.729 that has built-in CN). Then these elements would forward the "silence data" downstream to the mixer which would then generate the correct comfort noise when it does not have any voice packets. That way, CN can be only generated if nothing is received (so it won't do strange things if the other party switches codecs mid call or in a multi-party call). That said, this measn that the CN is not generated by the decoder but by the mixer. This is easy to implement for codecs that use the generic RFC 3389 CN packets, but it is probably more tricky to implement for codecs (like Speex or G.729) that have their comfoirt noise algorithms. So maybe another solution is needed, like having the decoder generate a comfort noise buffer when they receive a "GstRTPPacketLost" event from the jitterbuffer (which should be only sent to the last active payload type per SSRC). My understanding is that the decoder should only generate CN after their receive one SID frame until another voice frame is received. That said, this solution means that in a multi-party call, one would get CN. Anyway, your input is welcome as you seem to know quite a bit more about the actual algorithms than I do. -- Olivier Crête oli...@co... Collabora Ltd |