Re: [tuxdroid-user] Dedicated sound normalizer / multiplexer daemon

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Sat, 24 Mar 2007 15:51:32 +0100, Florent THIERY <ft...@gm...>  
wrote:

> If we want "ultimate" modularity for soud (be it the choice of TTS
> engine, STT, or any sound-enabled sound), we may want to develop/find
> a dedicated sound daemon that:
>
> - creates a virtual sound device with an explicit naming convention (
> /dev/tuxmic & /dev/tuxspk for example)
> - downsamples/transcodes every incoming sound to 8bit 8khz sound (for
> instance, try to play an mp3 with xmms, it won't work :p)

using the alsa plughw devices uses dmix in the chain so resampling id done  
there. Try to play a file with
aplay -D plughw:1,0 wavfile

> - eventually filters the 500 Hz noise coming from the mic

That would be a good idea indeed

> - acts as a frontend for the TTS engines (if you input text to the
> daemon, it uses TTS; if it's sound, it sends it to tux)
> - handles multiplex / queuing of sound events (the wav merger doesn't
> seem to me a long-term solution...)

If you make reference to the python merge thing used by gtdi.py, it's not  
meant to play sounds but to store them in the internal flash. It computes  
indexes and merges all the sounds into 1 long wavfile which is played when  
tux is in storing mode. We don't want any blank space between the sounds  
in storing mode.

> - sound normalization to handle the mouth problem (open/close)
> - avoid/postpone tux animation when microphone usage is needed
>
> There's an inherent problem with the mic: it's in the mouth (it's one
> of the most discutable technical choices to me...), but WHY? Speech
> recognition programs rely heavily on a good sound level tuning, and
> the fact that the sound level varies from 50% when opening the mouth
> will not help...

I'm partly responsible for that, and indeed we should have found a better  
solution. We didn't want to have a hole in the front side of Tux and the  
microphone had to be on that side. At that time we didn't think about  
speech recognition, more about VOIP applications and the need to have to  
open the mouth to be able to use it was not considered a major problem.
  Now it's different if you have to let it the mouth open all the time. I  
now did some tests but didn't find such a big difference. The level is a  
slightly lower but the main difference I noticed is about the tone, when  
the mouth is closed, the higher frequencies are attenuated and the result  
is not as bright as it is when the mouth is open. We should try it with  
the speech recognition software, I'm not sure that will be a problem for  
it.

Now looking at tux again, I still can't find a good place for the  
microphone. Maybe close to the beak on the side. Time to release the first  
hardware hack ;-)

> Another technical choice that makes me wonder: there's a line in / out
> in the back of tux... But if i'm not mistaken, it will be limited to 8
> bits / 8khz. So these I/O won't really add feature, except for
> earphone operation...

Yes, that was not meant to act as line-in and lin-out at all. Line-out is  
in fact just a headphone output. And line-in is even not connected to the  
computer, it's just a link to the amplifier so you can use tux as a small  
speaker box for your mp3 player. There's no frequency limitation on the  
line-in, it's just plain analog so you should get a better quality when  
using your portable player. The quality is limited by the speaker in this  
case.

> Well, sorry to look mean, i'm just wondering why the engineering team
> did these choices...

Hope you get a better idea now.