first, thank you mr. Shmyrev .. (i'm still trying to build sphinxbase and sphinx3 now .. since i only have sphinx4 in my machine and do not have vc++)
sorry for being not very clear, what i am searching for is clean speech feature vectors values for each fonem (39 english phonemes) outputted by sphinx4 features extractor, so i can use it to benchmarking my features extractor simulation (in matlab) ..
Q1: is it possible to get this clean speech feature vectors values? or do you have any suggestion for that?
form the literature, i find that mfccs based speech recognizer will usually use 3x13 freq + 1 energy coefficient = total 40 values for 1 feature vectors and for each phoneme usually consist of 3 sets (3x20ms STFT windows)
Q2: is sphinx4 also use this format? if, yes, how do sphinx4 store this values in its acoustic corpus? is it combination beetween means, variances, mixture_weights, erc files?
thanks for your time ..
regards,
zbastian
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Right for what? For a sample of feature file? No, feature files are in binary format, you can download an4 database in mfcc for example or create your own feature file from a wave file with wave2feat.
> question 2: how do i read this info? are they mfcc values? (there are 40 values each set)
No, they are means of gaussians, of course they describe some "average" mfcc vectors for each of 135 gaussians in RM1 model.
> question 3: is there any tools to convert others means or variances files in binary to ascii and vice versa?
yes, bin/printp.exe
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
dear all,
i am searching for speech feature values and i find this test_means.1.txt file from ...\SphinxTrain\test\cp_parm folder.
question 1: is it the right file?
i open the file with notepad and get this information:
param 135 1 1
mgau 0
feat 0
density 0 -4.367e-07 3.294e-08 -7.529e-09 2.243e-08 6.745e-09 3.216e-09 -1.427e-08 -2.549e-09 6.745e-09 -6.157e-09 -6.451e-09 -1.608e-09 2.667e-09 4.987e-02 2.084e-02 9.597e-04 1.033e-02 3.889e-04 -4.084e-03 4.686e-04 -9.042e-04 -1.617e-03 -5.149e-03 -5.388e-03 -3.314e-03 -8.431e-03 -7.605e-03 -1.187e-02 -7.943e-03 -1.409e-02 -7.232e-04 6.492e-03 -3.121e-03 -5.490e-03 2.059e-03 3.168e-03 -2.580e-03 -3.544e-03 6.253e-03 .. ets ..
question 2: how do i read this info? are they mfcc values? (there are 40 values each set)
question 3: is there any tools to convert others means or variances files in binary to ascii and vice versa?
thank you in advanced
regards,
zbastian
first, thank you mr. Shmyrev .. (i'm still trying to build sphinxbase and sphinx3 now .. since i only have sphinx4 in my machine and do not have vc++)
sorry for being not very clear, what i am searching for is clean speech feature vectors values for each fonem (39 english phonemes) outputted by sphinx4 features extractor, so i can use it to benchmarking my features extractor simulation (in matlab) ..
Q1: is it possible to get this clean speech feature vectors values? or do you have any suggestion for that?
form the literature, i find that mfccs based speech recognizer will usually use 3x13 freq + 1 energy coefficient = total 40 values for 1 feature vectors and for each phoneme usually consist of 3 sets (3x20ms STFT windows)
Q2: is sphinx4 also use this format? if, yes, how do sphinx4 store this values in its acoustic corpus? is it combination beetween means, variances, mixture_weights, erc files?
thanks for your time ..
regards,
zbastian
> if, yes, how do sphinx4 store this values in its acoustic corpus?
I think you don't understant the following concepts:
Speech
Phoneme
Phone
Corpus
Waveform
Feature
Cepstrum
Gaussian
Mean
Variance
Gaussian Mixture
Hidden Markov Model
Acoustic model
Until you read a book and learn the concepts above our discussion is senseless.
> question 1: is it the right file?
Right for what? For a sample of feature file? No, feature files are in binary format, you can download an4 database in mfcc for example or create your own feature file from a wave file with wave2feat.
> question 2: how do i read this info? are they mfcc values? (there are 40 values each set)
No, they are means of gaussians, of course they describe some "average" mfcc vectors for each of 135 gaussians in RM1 model.
> question 3: is there any tools to convert others means or variances files in binary to ascii and vice versa?
yes, bin/printp.exe