I am training a PTM model with VoxForge 8KHz speech data (41 hours) for Spanish language, see the configuration below.
And I am getting this kind error for all of the data files at the stage 11 (force align):
Searching in this forum I found a response "there is a mistake in the feature extraction" but I am not using any customized feature extraction software, but the sphinx-5prealpha one.
Maybe the 1s_c_d_dd configuration is not suitable and I should use the default s2_4x instead?.
And I am getting this kind error for all of the data files at the stage 11 (force align):
db2.5.falign.log:ERROR: "main_align.c", line 850: Uttid mismatch: ctlfile = "es-0042"; transcript = "jigdominguez-20100602-rxf/wav/es-0042"
You can ignore this error, it is not critical. Otherwise you need to change the data preparation script so that the utterance id will be es-0042 in the transcription file, not full path as jigdominguez-20100602-rxf/wav/es-0042
Searching in this forum I found a response "there is a mistake in the feature extraction" but I am not using any customized feature extraction software, but the sphinx-5prealpha one.
Maybe the 1s_c_d_dd configuration is not suitable and I should use the default s2_4x instead?.
This is irrelevant, you misunderstood.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks a lot for the response Nickolay!
I changed de data preparation and now it works ok.
Only two small doubts:
The samples are organised this way:
speaker1/file_1
speaker1/file_2
speaker2/file_3
...
if accidentaly in the same "speakerN" directory are wav files mixed from different speakers, what is the effect in the training?
I want to try VLTN training, I would like to confirm that the pocketsphinx decoder will take it into account, because I have not found any explicit option to inform pocketsphinx about this circumstance (and I have the same concern about other training options that requires special decoding treatment)
Thanks again.
Mar
Last edit: Maria del Mar Martinez Sanchez 2017-09-07
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
if accidentaly in the same "speakerN" directory are wav files mixed from different speakers, what is the effect in the training?
There is no effect right now
I want to try VLTN training, I would like to confirm that the pocketsphinx decoder will take it into account, because I have not found any explicit option to inform pocketsphinx about this circumstance (and I have the same concern about other training options that requires special decoding treatment)
vtln is not supported in pocketsphinx decoder, if you want vtln, you'd better try kaldi.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I am training a PTM model with VoxForge 8KHz speech data (41 hours) for Spanish language, see the configuration below.
And I am getting this kind error for all of the data files at the stage 11 (force align):
Searching in this forum I found a response "there is a mistake in the feature extraction" but I am not using any customized feature extraction software, but the sphinx-5prealpha one.
Maybe the 1s_c_d_dd configuration is not suitable and I should use the default s2_4x instead?.
Thanks in advance,
Mar
Last edit: Maria del Mar Martinez Sanchez 2017-08-22
You can ignore this error, it is not critical. Otherwise you need to change the data preparation script so that the utterance id will be es-0042 in the transcription file, not full path as jigdominguez-20100602-rxf/wav/es-0042
This is irrelevant, you misunderstood.
Thanks a lot for the response Nickolay!
I changed de data preparation and now it works ok.
Only two small doubts:
The samples are organised this way:
speaker1/file_1
speaker1/file_2
speaker2/file_3
...
if accidentaly in the same "speakerN" directory are wav files mixed from different speakers, what is the effect in the training?
I want to try VLTN training, I would like to confirm that the pocketsphinx decoder will take it into account, because I have not found any explicit option to inform pocketsphinx about this circumstance (and I have the same concern about other training options that requires special decoding treatment)
Thanks again.
Mar
Last edit: Maria del Mar Martinez Sanchez 2017-09-07
There is no effect right now
vtln is not supported in pocketsphinx decoder, if you want vtln, you'd better try kaldi.
Thanks a lot Nickolay, I will take into consideration your advices.