I'm finalizing/optimizing the acoustic model I need but I'm wondering what the idea is behind not using 16kHz samples for creating the model for telefone.
I have created the WindowsPhone sample for PocketSphinx, and I think its implemented with 16kHz and it works.
What is the reason for using 8kHz for telefone?
And a some litle follow up question:
Does the 8kHz sample need to be 8 bits?
And wouldn't it be better to use 32bits samples if you have them?
Last edit: Toine db 2014-10-31
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm finalizing/optimizing the acoustic model I need but I'm wondering what the idea is behind not using 16kHz samples for creating the model for telefone.
I have created the WindowsPhone sample for PocketSphinx, and I think its implemented with 16kHz and it works.
What is the reason for using 8kHz for telefone?
It's cut of above 4kHz, but when I record with the Windows Phone microphone I get 16 kHz.....
Do you know the reason why 16 bits samples are required for the model? (and not 8 or 32?)
(http://cmusphinx.sourceforge.net/wiki/tutorialam#setting_up_the_training_scripts)
Again tnx for the replay
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Telephone speech is transmitted over a cellular network or a PSTN
line. If you record audio with microphone regardless of whether it's a
mobile phone or a computer, you should use 16kHz audio models.
It's cut of above 4kHz, but when I record with the Windows Phone microphone I get 16 kHz.....
Do you know the reason why 16 bits samples are required for the model? (and not 8 or 32?)
(http://cmusphinx.sourceforge.net/wiki/tutorialam#setting_up_the_training_scripts)
I'm finalizing/optimizing the acoustic model I need but I'm wondering what the idea is behind not using 16kHz samples for creating the model for telefone.
I have created the WindowsPhone sample for PocketSphinx, and I think its implemented with 16kHz and it works.
What is the reason for using 8kHz for telefone?
And a some litle follow up question:
Does the 8kHz sample need to be 8 bits?
And wouldn't it be better to use 32bits samples if you have them?
Last edit: Toine db 2014-10-31
This is because telephone audio is cut at frequencies above 4KHz.
On Fri, Oct 31, 2014 at 6:57 PM, Toine db tony_mortana@users.sf.net wrote:
--
Sincerely, Alexander
Tnx for the reply.
It's cut of above 4kHz, but when I record with the Windows Phone microphone I get 16 kHz.....
Do you know the reason why 16 bits samples are required for the model? (and not 8 or 32?)
(http://cmusphinx.sourceforge.net/wiki/tutorialam#setting_up_the_training_scripts)
Again tnx for the replay
Telephone speech is transmitted over a cellular network or a PSTN
line. If you record audio with microphone regardless of whether it's a
mobile phone or a computer, you should use 16kHz audio models.
On Fri, Oct 31, 2014 at 10:50 PM, Toine db tony_mortana@users.sf.net wrote:
--
Sincerely, Alexander
Great, that was the answer I was looking for :-)
Many Thanks Alexander.
PS: do you know if the bit ratio has any effect? 16/32 bits?
(sounds like 32 bit is better, but if pocketsphinx wont work with this....)
32 bits is not a "bit ratio" but "sample bit width".
http://music.columbia.edu/cmc/musicandcomputers/chapter2/02_05.php
32 bits sample width is not required for speech recognition
Yes
Thanks for the link
I'm sorry to ask again; does pocketsphinx support 32bit and does it work better with 32bit?
The real question is;
Would you downsample 32bit samples to 16bit or just use the 32bit?
Thanks for the support
No
No
This process is not named "downsampling", it's just format conversion. Downsampling is when you change sample rate from 16khz to 8khz for example.
Just use 16bit in recorder
Great answer, everything I need to know to make the best models.
So all 16 bits.
(sorry for the name mixup downsampling/conversions, I'm glad you understood the question)