now that i finished creating my language model (thanks nickolay for your help), im using it now using sphinx3_continuous... i tried decoding a file but its hypothesis is too far from the original (lets call this hypothesis A)... so i tried decoding it with another copy... that is, i decode a the file and a copy of it in batch and i come up with two hypothesis... the first one was the same when i decode only one file (hypothesis A) but the second hypothesis (lets call this hypothesis B) is not the same as the first one but its accuracy is greater than the first one.... i tried it with more than 2 copies.. still the first hypothesis is the same as the first hypothesis (hypothesis A) of the previous tests but the second to the last hypothesis are the same and having greater accuracy (hypothesis B)... so now i wonder whats going on at the first time of decoding?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Something wrong with initial states in your lm, no? I suspect they are initialized first with some values and on second utterance latest ngram state is used.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
umm... im using both the language model i created and the language model that can be downloaded at the CMU Sphinx Open Source Model website... both language models yield results that don't match.... im using the sphinx3_continuous without any code modification...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ok, I've looked at this. It looks that random dither noise applied during feature extraction is the reason. If you will decode mfc files result will be the same.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
the acoustic model used is the hub4 opensource model 6000 senones and the language model, dictionary and filledict are the same as the once found in the open source language model in the cmu sphinx site..
my control file have this:
arctic_a0001-sin
arctic_a0001-sin2 //i copied the original file and renamed it to this...
so basically this is the control file, the config file and raw audio file is used to produce this output:
WELL THERE COULD YOUR TRAIL PHILLIPS DEALS THAT ARE OUT (arctic_a0001-sin_0.624)
A THIRD OF THE DANGER TRAIL PHILIP'S DEALS THAT CETERA (arctic_a0001-sin2_0.624)
and the correct transcript for this is
AUTHOR OF THE DANGER TRAIL PHILIP STEELS ET CETERA
by the way, i'm compiling sphinx3_continuous using visual studio 2005 (it that makes a difference) and am running on a windows xp sp2...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It depends on what you are trying to get. If you want to use mfc you can use another variant of the decoder, say sphinx3_decode for example. But it won't give you much except the advantage of having the same output from the same data. Alternatively, you can disable dither and you'll get the same result with the same raw files.
The point is that recognition is very unstable, even small noise affects quality.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
hello again,
now that i finished creating my language model (thanks nickolay for your help), im using it now using sphinx3_continuous... i tried decoding a file but its hypothesis is too far from the original (lets call this hypothesis A)... so i tried decoding it with another copy... that is, i decode a the file and a copy of it in batch and i come up with two hypothesis... the first one was the same when i decode only one file (hypothesis A) but the second hypothesis (lets call this hypothesis B) is not the same as the first one but its accuracy is greater than the first one.... i tried it with more than 2 copies.. still the first hypothesis is the same as the first hypothesis (hypothesis A) of the previous tests but the second to the last hypothesis are the same and having greater accuracy (hypothesis B)... so now i wonder whats going on at the first time of decoding?
Something wrong with initial states in your lm, no? I suspect they are initialized first with some values and on second utterance latest ngram state is used.
umm... im using both the language model i created and the language model that can be downloaded at the CMU Sphinx Open Source Model website... both language models yield results that don't match.... im using the sphinx3_continuous without any code modification...
Can you please share wav files and scripts to start sphinx3_continuous so we can try them?
umm, nickolay, have you tried it using the files and configuration i gave you? anything you found?
Yes, I tried, no time to look for explanation yet, sorry. I'll try to look this week.
okay thanks... ill keep track of this forum then...
Ok, I've looked at this. It looks that random dither noise applied during feature extraction is the reason. If you will decode mfc files result will be the same.
this is the raw audio file. 16khz sample rate, 16 bits per sample, 1 channel (mono):
http://rapidshare.com/files/71864506/arctic_a0001-sin.raw
the configuration file looks like this:
-mdef .\hub4opensrc.6000.mdef
-senmgau .cont.
-mean .\means
-var .\variances
-mixw .\mixture_weights
-tmat .\transition_matrices
-feat 1s_c_d_dd
-wbeam 1e-100
-dict .\cmudict.06d
-fdict .\fillerdict
-lm .\language_model.arpaformat.DMP
-ctloffset 0
-ctlcount 600
-agc none
-varnorm no
-lw 13
-wip 0.2
-hyp .\test.match
-cmn current
-hypseg .\test.hypseg
the acoustic model used is the hub4 opensource model 6000 senones and the language model, dictionary and filledict are the same as the once found in the open source language model in the cmu sphinx site..
my control file have this:
arctic_a0001-sin
arctic_a0001-sin2 //i copied the original file and renamed it to this...
so basically this is the control file, the config file and raw audio file is used to produce this output:
WELL THERE COULD YOUR TRAIL PHILLIPS DEALS THAT ARE OUT (arctic_a0001-sin_0.624)
A THIRD OF THE DANGER TRAIL PHILIP'S DEALS THAT CETERA (arctic_a0001-sin2_0.624)
and the correct transcript for this is
AUTHOR OF THE DANGER TRAIL PHILIP STEELS ET CETERA
by the way, i'm compiling sphinx3_continuous using visual studio 2005 (it that makes a difference) and am running on a windows xp sp2...
so all i have to do input mfc files instead of raw files? but sphinx3_continuous only accept raw files... how can i change it to accept mfc files?
It depends on what you are trying to get. If you want to use mfc you can use another variant of the decoder, say sphinx3_decode for example. But it won't give you much except the advantage of having the same output from the same data. Alternatively, you can disable dither and you'll get the same result with the same raw files.
The point is that recognition is very unstable, even small noise affects quality.