Hi, I'm using pocketsphinx0.1.0 with python3.5 and I have a few questions concerning the cepstral mean normalization.
1) what are the exact differences between current,live and prior cmn?
2) when i change the feat.params file to say "-cmn prior", while configuration of the decoder runs, the "Current configuration:" text says "-cmn [VALUE] prior" but the first INFO line then says:
While decoding it says also cmn live:
INFO: cmn_live.c(88): Update from < 40.00 3.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_live.c(105): Update to < 58.21 14.48 -0.80 16.94 -5.80 0.03 1.24 0.23 -1.61 -2.09 -2.75 -5.72 -5.38 >
3) The reason i wanted to use "prior" was this:
I am decoding different audiofiles (~ <10s) one after another with varying quality (diff. recording systems used, diff. environmental influences, SNRs etc.) Now, if I decode a file once, the cmn update is made at some point during the utterance and the accuracy for this output is not so good. If I then decode it again, the accuracy is higher (my guess is that the cmn is done with the new cmn coefficients, derived from the same file, and so the feature extraction is more rubust or accurate) So my idea was to process a file one time just for the sake of getting the fitting cmn coefficients and then a second time, for collecting the recognition result (I'm doing it with prerecorded files and the consumed time is not such an issue)
I don't know if prior is even the right choice for this, so i would appreciate any help!
4) For interest maybe an expert can tell me how there are different cmn coefficients calculated for the same file (maybe it has something to do with which portions of the audio is used to do this!?) And how often is the calculation done (i read something about 500 frames or something in this forum? How long is one frame?)
Thank you in advance!!! ;)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1) what are the exact differences between current,live and prior cmn?
"live" is a new name of "prior". "current" is same as "batch". They were renamed for conistency.
While decoding it says also cmn live:
In continuos processing mode live is automatically enabled. Batch mode is only used in pocketsphinx_batch.
So my idea was to process a file one time just for the sake of getting the fitting cmn coefficients and then a second time, for collecting the recognition result (I'm doing it with prerecorded files and the consumed time is not such an issue)
I don't know if prior is even the right choice for this, so i would appreciate any help!
I would simply set a reasonable cmninit estimate in feat.params file and it should works ok. You can take the values printed in the log. You can reprocess the beginning of the file too, but you will have to modify pocketsphinx code for that.
4) For interest maybe an expert can tell me how there are different cmn coefficients calculated for the same file (maybe it has something to do with which portions of the audio is used to do this!?) And how often is the calculation done (i read something about 500 frames or something in this forum? How long is one frame?)
Yes, it also depends on the history. Estimate is updated with a sliding window every 5 seconds or 500 frames. 1 frame is 1/100 second.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for the quick answer!
As I understand, when i set the cmninit values, these values will be set in the configuration but "only" used for the first five seconds of decoding, until the next estimate or update of cmn_live is done, right? When I, for example, after decoding a 6s long file or so, now choose another file with totally different kinds of convolutional distortions, the cmninit values of course would not have an influence, because they would have already been updated.
So if I for example would always give the audiofile twice to the decoder (like I'm doing at the moment) it could be, that sometimes during the first decoding process the cmn values would be updated and that sometimes they're not, depending on the lenght and history.
So if every file was at least 5 seconds, my approach would work and I would be sure that I would always have updated cmn values when beginning to process the file the second time?
You see my "problem" that cmninit may not help me, because I'm processing one file after another in a loop, which may have very different acoustic properties.
I hope my thoughts can be understood :D
Last edit: Jonas Helm 2016-09-15
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If your files are all short and you don't need very fast response you can use ps_process_raw with last argument (full_utt) set to TRUE. Then it will use batch CMN and process the whole file at once.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thats interesting.
I did this now, but I still get a worse recognition result the first time I decode the file and a better result in the second run.
I thought, that with this approach the first run would already be fine because the cmn update would be done directly for this file? Or is it the case that with this approach I can be sure that during the second run there are the right cmn values for normalizing?
Or can I even somehow let the cmn values be calculated and the for some time freeze them?
Also how long or short would you suggest should the files be to use batch cmn?
Hi, I'm using pocketsphinx0.1.0 with python3.5 and I have a few questions concerning the cepstral mean normalization.
1) what are the exact differences between current,live and prior cmn?
2) when i change the feat.params file to say "-cmn prior", while configuration of the decoder runs, the "Current configuration:" text says "-cmn [VALUE] prior" but the first INFO line then says:
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='live', VARNORM='no', AGC='none'
While decoding it says also cmn live:
INFO: cmn_live.c(88): Update from < 40.00 3.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_live.c(105): Update to < 58.21 14.48 -0.80 16.94 -5.80 0.03 1.24 0.23 -1.61 -2.09 -2.75 -5.72 -5.38 >
3) The reason i wanted to use "prior" was this:
I am decoding different audiofiles (~ <10s) one after another with varying quality (diff. recording systems used, diff. environmental influences, SNRs etc.) Now, if I decode a file once, the cmn update is made at some point during the utterance and the accuracy for this output is not so good. If I then decode it again, the accuracy is higher (my guess is that the cmn is done with the new cmn coefficients, derived from the same file, and so the feature extraction is more rubust or accurate) So my idea was to process a file one time just for the sake of getting the fitting cmn coefficients and then a second time, for collecting the recognition result (I'm doing it with prerecorded files and the consumed time is not such an issue)
I don't know if prior is even the right choice for this, so i would appreciate any help!
4) For interest maybe an expert can tell me how there are different cmn coefficients calculated for the same file (maybe it has something to do with which portions of the audio is used to do this!?) And how often is the calculation done (i read something about 500 frames or something in this forum? How long is one frame?)
Thank you in advance!!! ;)
"live" is a new name of "prior". "current" is same as "batch". They were renamed for conistency.
In continuos processing mode live is automatically enabled. Batch mode is only used in pocketsphinx_batch.
I would simply set a reasonable cmninit estimate in feat.params file and it should works ok. You can take the values printed in the log. You can reprocess the beginning of the file too, but you will have to modify pocketsphinx code for that.
Yes, it also depends on the history. Estimate is updated with a sliding window every 5 seconds or 500 frames. 1 frame is 1/100 second.
Thank you for the quick answer!
As I understand, when i set the cmninit values, these values will be set in the configuration but "only" used for the first five seconds of decoding, until the next estimate or update of cmn_live is done, right? When I, for example, after decoding a 6s long file or so, now choose another file with totally different kinds of convolutional distortions, the cmninit values of course would not have an influence, because they would have already been updated.
So if I for example would always give the audiofile twice to the decoder (like I'm doing at the moment) it could be, that sometimes during the first decoding process the cmn values would be updated and that sometimes they're not, depending on the lenght and history.
So if every file was at least 5 seconds, my approach would work and I would be sure that I would always have updated cmn values when beginning to process the file the second time?
You see my "problem" that cmninit may not help me, because I'm processing one file after another in a loop, which may have very different acoustic properties.
I hope my thoughts can be understood :D
Last edit: Jonas Helm 2016-09-15
If your files are all short and you don't need very fast response you can use ps_process_raw with last argument (full_utt) set to TRUE. Then it will use batch CMN and process the whole file at once.
Thats interesting.
I did this now, but I still get a worse recognition result the first time I decode the file and a better result in the second run.
I thought, that with this approach the first run would already be fine because the cmn update would be done directly for this file? Or is it the case that with this approach I can be sure that during the second run there are the right cmn values for normalizing?
Or can I even somehow let the cmn values be calculated and the for some time freeze them?
Also how long or short would you suggest should the files be to use batch cmn?
The first run looks like this:
The second run like this:
Last edit: Jonas Helm 2016-09-16
This feature is not supported by pocketsphinx yet.
3-10 seconds are enough.
You can continue discussion in other thread:
https://sourceforge.net/p/cmusphinx/discussion/help/thread/51e2979b