Thread: [tuxdroid-user] Dedicated sound normalizer / multiplexer daemon

Status: Beta

Brought to you by: ks156

tux-droid-user

[tuxdroid-user] Dedicated sound normalizer / multiplexer daemon

From: Florent T. <ft...@gm...> - 2007-03-24 14:51:51

Hi

If we want "ultimate" modularity for soud (be it the choice of TTS
engine, STT, or any sound-enabled sound), we may want to develop/find
a dedicated sound daemon that:

- creates a virtual sound device with an explicit naming convention (
/dev/tuxmic & /dev/tuxspk for example)
- downsamples/transcodes every incoming sound to 8bit 8khz sound (for
instance, try to play an mp3 with xmms, it won't work :p)
- eventually filters the 500 Hz noise coming from the mic
- acts as a frontend for the TTS engines (if you input text to the
daemon, it uses TTS; if it's sound, it sends it to tux)
- handles multiplex / queuing of sound events (the wav merger doesn't
seem to me a long-term solution...)
- sound normalization to handle the mouth problem (open/close)
- avoid/postpone tux animation when microphone usage is needed

There's an inherent problem with the mic: it's in the mouth (it's one
of the most discutable technical choices to me...), but WHY? Speech
recognition programs rely heavily on a good sound level tuning, and
the fact that the sound level varies from 50% when opening the mouth
will not help...

Another technical choice that makes me wonder: there's a line in / out
in the back of tux... But if i'm not mistaken, it will be limited to 8
bits / 8khz. So these I/O won't really add feature, except for
earphone operation...

Well, sorry to look mean, i'm just wondering why the engineering team
did these choices...

What do you think?

Re: [tuxdroid-user] Dedicated sound normalizer / multiplexer daemon

From: Philippe T. <ph...@te...> - 2007-03-24 15:44:30

Hi,
> - downsamples/transcodes every incoming sound to 8bit 8khz sound (for
> instance, try to play an mp3 with xmms, it won't work :p)
>
>   
Yes it can:
Select alsa output plugin -> configure -> type "plughw:1,0" manually
and select software volume control.

Or select OSS output plugin -> configure -> USB
but then no volume control is possible.

Or select esound output plugin
and launch esd with:
esd -d plughw:1,0

This last solution allows you to mix sources easily
e.g. when I was playing a mp3 with xmms I launched

mplayer -ao esd some_other.mp3 <http://www.paul.sladen.org/pronunciation/torvalds-says-linux.mp3>

So esound could be the kind of stuff you're looking for :-)
Phil

Re: [tuxdroid-user] Dedicated sound normalizer / multiplexer daemon

From: Florent T. <ft...@gm...> - 2007-03-24 16:06:00

Cool. So a modification of esd would do the job

Re: [tuxdroid-user] Dedicated sound normalizer / multiplexer daemon

From: David B. <da...@ja...> - 2007-03-24 16:15:48

On Sat, 24 Mar 2007 15:51:32 +0100, Florent THIERY <ft...@gm...>  
wrote:

> If we want "ultimate" modularity for soud (be it the choice of TTS
> engine, STT, or any sound-enabled sound), we may want to develop/find
> a dedicated sound daemon that:
>
> - creates a virtual sound device with an explicit naming convention (
> /dev/tuxmic & /dev/tuxspk for example)
> - downsamples/transcodes every incoming sound to 8bit 8khz sound (for
> instance, try to play an mp3 with xmms, it won't work :p)

using the alsa plughw devices uses dmix in the chain so resampling id done  
there. Try to play a file with
aplay -D plughw:1,0 wavfile

> - eventually filters the 500 Hz noise coming from the mic

That would be a good idea indeed

> - acts as a frontend for the TTS engines (if you input text to the
> daemon, it uses TTS; if it's sound, it sends it to tux)
> - handles multiplex / queuing of sound events (the wav merger doesn't
> seem to me a long-term solution...)

If you make reference to the python merge thing used by gtdi.py, it's not  
meant to play sounds but to store them in the internal flash. It computes  
indexes and merges all the sounds into 1 long wavfile which is played when  
tux is in storing mode. We don't want any blank space between the sounds  
in storing mode.

> - sound normalization to handle the mouth problem (open/close)
> - avoid/postpone tux animation when microphone usage is needed
>
> There's an inherent problem with the mic: it's in the mouth (it's one
> of the most discutable technical choices to me...), but WHY? Speech
> recognition programs rely heavily on a good sound level tuning, and
> the fact that the sound level varies from 50% when opening the mouth
> will not help...

I'm partly responsible for that, and indeed we should have found a better  
solution. We didn't want to have a hole in the front side of Tux and the  
microphone had to be on that side. At that time we didn't think about  
speech recognition, more about VOIP applications and the need to have to  
open the mouth to be able to use it was not considered a major problem.
  Now it's different if you have to let it the mouth open all the time. I  
now did some tests but didn't find such a big difference. The level is a  
slightly lower but the main difference I noticed is about the tone, when  
the mouth is closed, the higher frequencies are attenuated and the result  
is not as bright as it is when the mouth is open. We should try it with  
the speech recognition software, I'm not sure that will be a problem for  
it.

Now looking at tux again, I still can't find a good place for the  
microphone. Maybe close to the beak on the side. Time to release the first  
hardware hack ;-)

> Another technical choice that makes me wonder: there's a line in / out
> in the back of tux... But if i'm not mistaken, it will be limited to 8
> bits / 8khz. So these I/O won't really add feature, except for
> earphone operation...

Yes, that was not meant to act as line-in and lin-out at all. Line-out is  
in fact just a headphone output. And line-in is even not connected to the  
computer, it's just a link to the amplifier so you can use tux as a small  
speaker box for your mp3 player. There's no frequency limitation on the  
line-in, it's just plain analog so you should get a better quality when  
using your portable player. The quality is limited by the speaker in this  
case.

> Well, sorry to look mean, i'm just wondering why the engineering team
> did these choices...

Hope you get a better idea now.

Re: [tuxdroid-user] Dedicated sound normalizer / multiplexer daemon

From: Florent T. <ft...@gm...> - 2007-03-24 16:48:52

> > - handles multiplex / queuing of sound events (the wav merger doesn't
> > seem to me a long-term solution...)
>
> If you make reference to the python merge thing used by gtdi.py, it's not
> meant to play sounds but to store them in the internal flash.

Ok. My apologizes

> > - sound normalization to handle the mouth problem (open/close)

In fact, the speech recognition / sound daemon could do this: if "tux"
is said (indicating the beginning of a spoken "order"), the mouth
opens.

> We should try it with the speech recognition software, I'm not sure that will be a problem for it.

Hope so :)

> Now looking at tux again, I still can't find a good place for the
> microphone. Maybe close to the beak on the side. Time to release the first
> hardware hack ;-)

Well, my first idea would be to put the mic under/inside one wing. I
haven't opened tux yet, it there a wire between the mic and the MB?

> Yes, that was not meant to act as line-in and lin-out at all. Line-out is
> in fact just a headphone output. And line-in is even not connected to the
> computer, it's just a link to the amplifier so you can use tux as a small
> speaker box for your mp3 player. There's no frequency limitation on the
> line-in, it's just plain analog so you should get a better quality when
> using your portable player. The quality is limited by the speaker in this
> case.

Cool :) It's, indeed, a great idea i had'nt thought about: tux as a
"ipod dock" :), batterypowered speaker. And, it creates a
supplementary line out for such a device: listen with tux, record on
the line out. As it's a closed circuit, the sound quality will be
equivalent?

> Hope you get a better idea now.

Yup :)

Dunno if you did read it, but the article
http://www.linuxjournal.com/article/4723 shows a method where
/dev/speech incarnates the TTS daemon (just do echo "test" >
/dev/speech). I find it a quick and simple solution. Again,
Cvoicecontrol is what we want: not really speech recognition, but
voice-driven command launching. And it's perfect. Too bad it's
unmaintained...

Re: [tuxdroid-user] Dedicated sound normalizer / multiplexer daemon

From: David B. <da...@ja...> - 2007-03-24 17:27:48

On Sat, 24 Mar 2007 17:48:50 +0100, Florent THIERY <ft...@gm...>  
wrote:

>> Now looking at tux again, I still can't find a good place for the
>> microphone. Maybe close to the beak on the side. Time to release the  
>> first
>> hardware hack ;-)
>
> Well, my first idea would be to put the mic under/inside one wing. I
> haven't opened tux yet, it there a wire between the mic and the MB?
>

Yes, with some scredrivers, it's pretty straightforward to open tux and  
remove the head gearbix. Just pull-out the microphone and glue it wherever  
you want. You have to monitor the signal though as the 500Hz noise can  
vary from the wire position. Just move the wire until the noise gets away  
and glue it in that position. Actually the 500Hz comes from the 2.4GHz  
signal which is pulsed at 500Hz (a frame is sent each 2ms, thus 500Hz).

> Cool :) It's, indeed, a great idea i had'nt thought about: tux as a
> "ipod dock" :), batterypowered speaker. And, it creates a
> supplementary line out for such a device: listen with tux, record on
> the line out. As it's a closed circuit, the sound quality will be
> equivalent?

Yes but sent through the amplifier. But the speaker is disconnected when  
you plug your headphones or a line out so it won't work.

Cheers,
david

Re: [tuxdroid-user] Dedicated sound normalizer / multiplexer daemon

From: Florent T. <ft...@gm...> - 2007-03-24 20:08:01

> Actually the 500Hz comes from the 2.4GHz signal which is pulsed at 500Hz (a frame is sent each 2ms, thus 500Hz).

Would'nt a passive "passe-haut" filter remove it? (no idea how it's
called in english :p)

http://fr.wikipedia.org/wiki/Filtre_passe-haut

Re: [tuxdroid-user] Dedicated sound normalizer / multiplexer daemon

From: Philippe T. <ph...@te...> - 2007-03-24 21:56:36

Florent THIERY wrote:
>> Actually the 500Hz comes from the 2.4GHz signal which is pulsed at 500Hz (a frame is sent each 2ms, thus 500Hz).
>>     
>
> Would'nt a passive "passe-haut" filter remove it? (no idea how it's
> called in english :p)
>
> http://fr.wikipedia.org/wiki/Filtre_passe-haut
>   
Follow the link to the english page ;-)
And you'll discover it's called High-pass filter
http://en.wikipedia.org/wiki/High-pass_filter

But if you skip everything belog let's say 550Hz you'll miss a big part.
Think that the musical A ("la" en français) to tune instruments is only 
440Hz
(as well as the phone tone)

500Hz is just in the middle of the useful audio band that's it :-(

What you need is a notch filter:
http://en.wikipedia.org/wiki/Band-stop_filter
but getting a very narrow one is not that obvious.
Depends how good is the AVR on audio processing...

Otherwise analog ones are possible, starting from the very basic RLC filter:
http://www.everything2.com/index.pl?node_id=1391618&lastnode_id=0

Phil

Re: [tuxdroid-user] Dedicated sound normalizer / multiplexer daemon

From: Florent T. <ft...@gm...> - 2007-03-24 22:10:12

> Otherwise analog ones are possible, starting from the very basic RLC filter:
> http://www.everything2.com/index.pl?node_id=1391618&lastnode_id=0

Yup, i was thinking of a basic RLC filter... But a software one will
be easier to do (more unobtrusive at least)

Do you think we can use audacity's nyquist libs?
http://www.audacity-forum.de/download/edgar/nyquist/nyquist-doc/manual/part12.html

Re: [tuxdroid-user] Dedicated sound normalizer / multiplexer daemon

From: David B. <da...@ja...> - 2007-03-25 19:45:12

On Sat, 24 Mar 2007 23:10:08 +0100, Florent THIERY <ft...@gm...>  =

wrote:

>> Otherwise analog ones are possible, starting from the very basic RLC =
 =

>> filter:
>> http://www.everything2.com/index.pl?node_id=3D1391618&lastnode_id=3D0=

>
> Yup, i was thinking of a basic RLC filter... But a software one will
> be easier to do (more unobtrusive at least)
>
> Do you think we can use audacity's nyquist libs?
> http://www.audacity-forum.de/download/edgar/nyquist/nyquist-doc/manual=
/part12.html

As the microphone signal always end up on the computer, software filteri=
ng  =

is the solution to go. A passive RLC filter won't be steep enough to onl=
y  =

cut a narrow band of 500Hz and the AVR won't be able to help as its  =

processing power is way too little for digital signal filtering.

The audacity noise remaoval function works great, you first create a  =

profile from a sample of noise only. Then you use that profile to remove=
  =

the noise an your signal. I don't know if that function is in the nyquis=
t  =

library but that's certainly the one that will give better results.

Re: [tuxdroid-user] Dedicated sound normalizer / multiplexer daemon

From: Florent T. <ft...@gm...> - 2007-03-25 20:49:54

> The audacity noise remaoval function works great, you first create a
> profile from a sample of noise only. Then you use that profile to remove
> the noise an your signal.

Yes, the question is: can we do it in real time (possibly using the
modified esd daemon) using only the api? Sb should contact the
audacity guys and ask for this...

Using an esd daemon is even better when you consider the NAS option:
just open the port, and connect your desktop apps to it.

Multiplexing and resampling is already handled by the daemon (if i got
it right), so we "just" have to add the filtering feature.

In short we have:
- the tux daemon
- the sound daemon
- the tts daemon
- the speech-driven command daemon (if we manage to find a suitable one)

... Lots of processes :)