Re: [tuxdroid-user] Dedicated sound normalizer / multiplexer daemon
Status: Beta
Brought to you by:
ks156
From: David B. <da...@ja...> - 2007-03-24 16:15:48
|
On Sat, 24 Mar 2007 15:51:32 +0100, Florent THIERY <ft...@gm...> wrote: > If we want "ultimate" modularity for soud (be it the choice of TTS > engine, STT, or any sound-enabled sound), we may want to develop/find > a dedicated sound daemon that: > > - creates a virtual sound device with an explicit naming convention ( > /dev/tuxmic & /dev/tuxspk for example) > - downsamples/transcodes every incoming sound to 8bit 8khz sound (for > instance, try to play an mp3 with xmms, it won't work :p) using the alsa plughw devices uses dmix in the chain so resampling id done there. Try to play a file with aplay -D plughw:1,0 wavfile > - eventually filters the 500 Hz noise coming from the mic That would be a good idea indeed > - acts as a frontend for the TTS engines (if you input text to the > daemon, it uses TTS; if it's sound, it sends it to tux) > - handles multiplex / queuing of sound events (the wav merger doesn't > seem to me a long-term solution...) If you make reference to the python merge thing used by gtdi.py, it's not meant to play sounds but to store them in the internal flash. It computes indexes and merges all the sounds into 1 long wavfile which is played when tux is in storing mode. We don't want any blank space between the sounds in storing mode. > - sound normalization to handle the mouth problem (open/close) > - avoid/postpone tux animation when microphone usage is needed > > There's an inherent problem with the mic: it's in the mouth (it's one > of the most discutable technical choices to me...), but WHY? Speech > recognition programs rely heavily on a good sound level tuning, and > the fact that the sound level varies from 50% when opening the mouth > will not help... I'm partly responsible for that, and indeed we should have found a better solution. We didn't want to have a hole in the front side of Tux and the microphone had to be on that side. At that time we didn't think about speech recognition, more about VOIP applications and the need to have to open the mouth to be able to use it was not considered a major problem. Now it's different if you have to let it the mouth open all the time. I now did some tests but didn't find such a big difference. The level is a slightly lower but the main difference I noticed is about the tone, when the mouth is closed, the higher frequencies are attenuated and the result is not as bright as it is when the mouth is open. We should try it with the speech recognition software, I'm not sure that will be a problem for it. Now looking at tux again, I still can't find a good place for the microphone. Maybe close to the beak on the side. Time to release the first hardware hack ;-) > Another technical choice that makes me wonder: there's a line in / out > in the back of tux... But if i'm not mistaken, it will be limited to 8 > bits / 8khz. So these I/O won't really add feature, except for > earphone operation... Yes, that was not meant to act as line-in and lin-out at all. Line-out is in fact just a headphone output. And line-in is even not connected to the computer, it's just a link to the amplifier so you can use tux as a small speaker box for your mp3 player. There's no frequency limitation on the line-in, it's just plain analog so you should get a better quality when using your portable player. The quality is limited by the speaker in this case. > Well, sorry to look mean, i'm just wondering why the engineering team > did these choices... Hope you get a better idea now. |