Thread: [tuxdroid-user] Dedicated sound normalizer / multiplexer daemon
Status: Beta
Brought to you by:
ks156
From: Florent T. <ft...@gm...> - 2007-03-24 14:51:51
|
Hi If we want "ultimate" modularity for soud (be it the choice of TTS engine, STT, or any sound-enabled sound), we may want to develop/find a dedicated sound daemon that: - creates a virtual sound device with an explicit naming convention ( /dev/tuxmic & /dev/tuxspk for example) - downsamples/transcodes every incoming sound to 8bit 8khz sound (for instance, try to play an mp3 with xmms, it won't work :p) - eventually filters the 500 Hz noise coming from the mic - acts as a frontend for the TTS engines (if you input text to the daemon, it uses TTS; if it's sound, it sends it to tux) - handles multiplex / queuing of sound events (the wav merger doesn't seem to me a long-term solution...) - sound normalization to handle the mouth problem (open/close) - avoid/postpone tux animation when microphone usage is needed There's an inherent problem with the mic: it's in the mouth (it's one of the most discutable technical choices to me...), but WHY? Speech recognition programs rely heavily on a good sound level tuning, and the fact that the sound level varies from 50% when opening the mouth will not help... Another technical choice that makes me wonder: there's a line in / out in the back of tux... But if i'm not mistaken, it will be limited to 8 bits / 8khz. So these I/O won't really add feature, except for earphone operation... Well, sorry to look mean, i'm just wondering why the engineering team did these choices... What do you think? |
From: Philippe T. <ph...@te...> - 2007-03-24 15:44:30
|
Hi, > - downsamples/transcodes every incoming sound to 8bit 8khz sound (for > instance, try to play an mp3 with xmms, it won't work :p) > > Yes it can: Select alsa output plugin -> configure -> type "plughw:1,0" manually and select software volume control. Or select OSS output plugin -> configure -> USB but then no volume control is possible. Or select esound output plugin and launch esd with: esd -d plughw:1,0 This last solution allows you to mix sources easily e.g. when I was playing a mp3 with xmms I launched mplayer -ao esd some_other.mp3 <http://www.paul.sladen.org/pronunciation/torvalds-says-linux.mp3> So esound could be the kind of stuff you're looking for :-) Phil |
From: Florent T. <ft...@gm...> - 2007-03-24 16:06:00
|
Cool. So a modification of esd would do the job |
From: David B. <da...@ja...> - 2007-03-24 16:15:48
|
On Sat, 24 Mar 2007 15:51:32 +0100, Florent THIERY <ft...@gm...> wrote: > If we want "ultimate" modularity for soud (be it the choice of TTS > engine, STT, or any sound-enabled sound), we may want to develop/find > a dedicated sound daemon that: > > - creates a virtual sound device with an explicit naming convention ( > /dev/tuxmic & /dev/tuxspk for example) > - downsamples/transcodes every incoming sound to 8bit 8khz sound (for > instance, try to play an mp3 with xmms, it won't work :p) using the alsa plughw devices uses dmix in the chain so resampling id done there. Try to play a file with aplay -D plughw:1,0 wavfile > - eventually filters the 500 Hz noise coming from the mic That would be a good idea indeed > - acts as a frontend for the TTS engines (if you input text to the > daemon, it uses TTS; if it's sound, it sends it to tux) > - handles multiplex / queuing of sound events (the wav merger doesn't > seem to me a long-term solution...) If you make reference to the python merge thing used by gtdi.py, it's not meant to play sounds but to store them in the internal flash. It computes indexes and merges all the sounds into 1 long wavfile which is played when tux is in storing mode. We don't want any blank space between the sounds in storing mode. > - sound normalization to handle the mouth problem (open/close) > - avoid/postpone tux animation when microphone usage is needed > > There's an inherent problem with the mic: it's in the mouth (it's one > of the most discutable technical choices to me...), but WHY? Speech > recognition programs rely heavily on a good sound level tuning, and > the fact that the sound level varies from 50% when opening the mouth > will not help... I'm partly responsible for that, and indeed we should have found a better solution. We didn't want to have a hole in the front side of Tux and the microphone had to be on that side. At that time we didn't think about speech recognition, more about VOIP applications and the need to have to open the mouth to be able to use it was not considered a major problem. Now it's different if you have to let it the mouth open all the time. I now did some tests but didn't find such a big difference. The level is a slightly lower but the main difference I noticed is about the tone, when the mouth is closed, the higher frequencies are attenuated and the result is not as bright as it is when the mouth is open. We should try it with the speech recognition software, I'm not sure that will be a problem for it. Now looking at tux again, I still can't find a good place for the microphone. Maybe close to the beak on the side. Time to release the first hardware hack ;-) > Another technical choice that makes me wonder: there's a line in / out > in the back of tux... But if i'm not mistaken, it will be limited to 8 > bits / 8khz. So these I/O won't really add feature, except for > earphone operation... Yes, that was not meant to act as line-in and lin-out at all. Line-out is in fact just a headphone output. And line-in is even not connected to the computer, it's just a link to the amplifier so you can use tux as a small speaker box for your mp3 player. There's no frequency limitation on the line-in, it's just plain analog so you should get a better quality when using your portable player. The quality is limited by the speaker in this case. > Well, sorry to look mean, i'm just wondering why the engineering team > did these choices... Hope you get a better idea now. |
From: Florent T. <ft...@gm...> - 2007-03-24 16:48:52
|
> > - handles multiplex / queuing of sound events (the wav merger doesn't > > seem to me a long-term solution...) > > If you make reference to the python merge thing used by gtdi.py, it's not > meant to play sounds but to store them in the internal flash. Ok. My apologizes > > - sound normalization to handle the mouth problem (open/close) In fact, the speech recognition / sound daemon could do this: if "tux" is said (indicating the beginning of a spoken "order"), the mouth opens. > We should try it with the speech recognition software, I'm not sure that will be a problem for it. Hope so :) > Now looking at tux again, I still can't find a good place for the > microphone. Maybe close to the beak on the side. Time to release the first > hardware hack ;-) Well, my first idea would be to put the mic under/inside one wing. I haven't opened tux yet, it there a wire between the mic and the MB? > Yes, that was not meant to act as line-in and lin-out at all. Line-out is > in fact just a headphone output. And line-in is even not connected to the > computer, it's just a link to the amplifier so you can use tux as a small > speaker box for your mp3 player. There's no frequency limitation on the > line-in, it's just plain analog so you should get a better quality when > using your portable player. The quality is limited by the speaker in this > case. Cool :) It's, indeed, a great idea i had'nt thought about: tux as a "ipod dock" :), batterypowered speaker. And, it creates a supplementary line out for such a device: listen with tux, record on the line out. As it's a closed circuit, the sound quality will be equivalent? > Hope you get a better idea now. Yup :) Dunno if you did read it, but the article http://www.linuxjournal.com/article/4723 shows a method where /dev/speech incarnates the TTS daemon (just do echo "test" > /dev/speech). I find it a quick and simple solution. Again, Cvoicecontrol is what we want: not really speech recognition, but voice-driven command launching. And it's perfect. Too bad it's unmaintained... |
From: David B. <da...@ja...> - 2007-03-24 17:27:48
|
On Sat, 24 Mar 2007 17:48:50 +0100, Florent THIERY <ft...@gm...> wrote: >> Now looking at tux again, I still can't find a good place for the >> microphone. Maybe close to the beak on the side. Time to release the >> first >> hardware hack ;-) > > Well, my first idea would be to put the mic under/inside one wing. I > haven't opened tux yet, it there a wire between the mic and the MB? > Yes, with some scredrivers, it's pretty straightforward to open tux and remove the head gearbix. Just pull-out the microphone and glue it wherever you want. You have to monitor the signal though as the 500Hz noise can vary from the wire position. Just move the wire until the noise gets away and glue it in that position. Actually the 500Hz comes from the 2.4GHz signal which is pulsed at 500Hz (a frame is sent each 2ms, thus 500Hz). > Cool :) It's, indeed, a great idea i had'nt thought about: tux as a > "ipod dock" :), batterypowered speaker. And, it creates a > supplementary line out for such a device: listen with tux, record on > the line out. As it's a closed circuit, the sound quality will be > equivalent? Yes but sent through the amplifier. But the speaker is disconnected when you plug your headphones or a line out so it won't work. Cheers, david |
From: Florent T. <ft...@gm...> - 2007-03-24 20:08:01
|
> Actually the 500Hz comes from the 2.4GHz signal which is pulsed at 500Hz (a frame is sent each 2ms, thus 500Hz). Would'nt a passive "passe-haut" filter remove it? (no idea how it's called in english :p) http://fr.wikipedia.org/wiki/Filtre_passe-haut |
From: Philippe T. <ph...@te...> - 2007-03-24 21:56:36
|
Florent THIERY wrote: >> Actually the 500Hz comes from the 2.4GHz signal which is pulsed at 500Hz (a frame is sent each 2ms, thus 500Hz). >> > > Would'nt a passive "passe-haut" filter remove it? (no idea how it's > called in english :p) > > http://fr.wikipedia.org/wiki/Filtre_passe-haut > Follow the link to the english page ;-) And you'll discover it's called High-pass filter http://en.wikipedia.org/wiki/High-pass_filter But if you skip everything belog let's say 550Hz you'll miss a big part. Think that the musical A ("la" en français) to tune instruments is only 440Hz (as well as the phone tone) 500Hz is just in the middle of the useful audio band that's it :-( What you need is a notch filter: http://en.wikipedia.org/wiki/Band-stop_filter but getting a very narrow one is not that obvious. Depends how good is the AVR on audio processing... Otherwise analog ones are possible, starting from the very basic RLC filter: http://www.everything2.com/index.pl?node_id=1391618&lastnode_id=0 Phil |
From: Florent T. <ft...@gm...> - 2007-03-24 22:10:12
|
> Otherwise analog ones are possible, starting from the very basic RLC filter: > http://www.everything2.com/index.pl?node_id=1391618&lastnode_id=0 Yup, i was thinking of a basic RLC filter... But a software one will be easier to do (more unobtrusive at least) Do you think we can use audacity's nyquist libs? http://www.audacity-forum.de/download/edgar/nyquist/nyquist-doc/manual/part12.html |
From: David B. <da...@ja...> - 2007-03-25 19:45:12
|
On Sat, 24 Mar 2007 23:10:08 +0100, Florent THIERY <ft...@gm...> = wrote: >> Otherwise analog ones are possible, starting from the very basic RLC = = >> filter: >> http://www.everything2.com/index.pl?node_id=3D1391618&lastnode_id=3D0= > > Yup, i was thinking of a basic RLC filter... But a software one will > be easier to do (more unobtrusive at least) > > Do you think we can use audacity's nyquist libs? > http://www.audacity-forum.de/download/edgar/nyquist/nyquist-doc/manual= /part12.html As the microphone signal always end up on the computer, software filteri= ng = is the solution to go. A passive RLC filter won't be steep enough to onl= y = cut a narrow band of 500Hz and the AVR won't be able to help as its = processing power is way too little for digital signal filtering. The audacity noise remaoval function works great, you first create a = profile from a sample of noise only. Then you use that profile to remove= = the noise an your signal. I don't know if that function is in the nyquis= t = library but that's certainly the one that will give better results. |
From: Florent T. <ft...@gm...> - 2007-03-25 20:49:54
|
> The audacity noise remaoval function works great, you first create a > profile from a sample of noise only. Then you use that profile to remove > the noise an your signal. Yes, the question is: can we do it in real time (possibly using the modified esd daemon) using only the api? Sb should contact the audacity guys and ask for this... Using an esd daemon is even better when you consider the NAS option: just open the port, and connect your desktop apps to it. Multiplexing and resampling is already handled by the daemon (if i got it right), so we "just" have to add the filtering feature. In short we have: - the tux daemon - the sound daemon - the tts daemon - the speech-driven command daemon (if we manage to find a suitable one) ... Lots of processes :) |