I had this problem while trying to run falignment>>>sphinx3_align:
s3_align.c:927: align_build_sent_hmm: Assertion `stail.predlist' failed.
I read a similar post on this problem dated in may last year, you suggested to
check for utf-8 bom coding in transcripts, but i think my problem might be
different cuz training was completed. Could you please suggest what to do?
Thank you.
Tony
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Check utf-8 bom symbol in the beginning of the transcript. sphinx3 should
break on it. Trainer is more robust to such issues now and could work fine
despite your file has problems.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Then while doing the 20 training ci models, this error ERROR: "corpus.c", line
355: Must be at least one line in the control file occured. What could be the
cause of this error?
I'm gonna say all my thanks here for all my posts and your replies so you
don't have to open the thread just to read thanks from people. Thank you
Tony
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Then while doing the 20 training ci models, this error ERROR: "corpus.c",
line 355: Must be at least one line in the control file occured. What could be
the cause of this error?
Your previous stage (forced alignment) didn't align any files properly, all
failed and the control file is empty because of that. You need to check forced
alignment stage logs to find out what is wrong
I'm gonna say all my thanks here for all my posts and your replies so you
don't have to open the thread just to read thanks from people.
You are welcome
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I used Kate to detect Unicode coding, nothing. I also saved the transcript
again in kate (I was told it automaticly removes BOM). But I still have
sphinx3_align: s3_align.c:927: align_build_sent_hmm: Assertion
`stail.predlist' failed. (What should I do now)
WARNING tmat.c line 192: Normalization failed failed for tmat 1 from state
0, 1, 2 (what does this mean)
Tony
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
WARNING: "tmat.c", line 192: Normalization failed for tmat 31 from state 0
WARNING: "tmat.c", line 192: Normalization failed for tmat 31 from state 1
WARNING: "tmat.c", line 192: Normalization failed for tmat 31 from state 2
Those warning from the log should make you think. You can find int the
model_architecture/name.falign.ci_mdef that tmat 31 corresponds to the phone
SIL.
The SIL was trained incorrectly because you don't have and in your
transcription file. Each line in the transcription file must start with
and must end with .
You should run verification stage before the training. Verification state also
issued a warning for you about SIL. You should take care about the warnings on
verification stage before you proceed to training.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What could cause the ERROR:"main_align.c" line 765: Final state not reached:
no alignment for audiofile? How should I solve this? Due to this problem a few
senones were also not observed.
failed to open ~falignout/model.alignfiles at 20. ci_hmm at Baum_welch.pl line
134. Is it due to the above problems?
Tony
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Should I just remove the corresponding phones and words from my files with
these senones?
You should better find out why the senones were not represented with enough
occurences in the training database. Either you need to train with less
senones or you need to drop rare phones.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I had fatal error during cd traing before with rare phones, so what I did was
find all the words containing these phones and turned them into ++words++
+rare_phone+ in all places and move them to filler dic, and cd training
completed. Now for falignment I moved these ++word++ from the filler dict to
dict according to the tutorial.The error log then says these modified
+rare_phone+ don't exit in the input data. I guess I should put them back into
the filler dic? If this is the case then what about the instruction for
falignment asking me to remove the fillers cuz faligner is not good at
inserting them?
Tony
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
"failed to open ~falignout/model.alignfiles at 20. ci_hmm at Baum_welch.pl
line 134. Is it due to the above problems? "
The file model.alignfiles wasn't created for some reason. For more information
you need to check training logs in logdir folder.
haha, your reply is exactly what was said in the sphinx log but in 20.
ci_hmm log there was no information regarding that :s
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
After enabling everything to do with force alignment , I trained and the
training completed, does it mean falignement is also complete? Which acoustic
model should I use? What about the model.falign_ci_gaussian models?
It's strange that when I tried to train another model, it failed because the
libblas.so.3 doesn't exist. But how come I could train other models no
problems?
They are ignored
Does it correspond to the warning that all the +phones+ senones are not
observed in the input data?
Tony
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I trained and the training completed, does it mean falignement is also
complete?
It's not related. You need to check logs and faligner folder contents for
details.
Which acoustic model should I use?
This question is covered in tutorial
What about the model.falign_ci_gaussian models?
Those are intermediate models used during the training
It's strange that when I tried to train another model, it failed because the
libblas.so.3 doesn't exist. But how come I could train other models no
problems?
Maybe you installed libblas.so.3 already
Does it correspond to the warning that all the +phones+ senones are not
observed in the input data?
No, it's unrelated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
They way I dropped rare phones are change the word containing the rare phones into a ++word++ filler with filler phone +rare_phone+, but this caused the senones were never observed in the input data warning? would this affect the acoustic model quality? How should I solve this?
After I added more training data, it caused more of the above warnings for phones that were observed before addition of data. Why?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
And in 30. cd model logdir norm.log I have couple million (T.T) "gauden.c" line 1554 (mgau=#, feat......) never observed warnings. What would be the cause? It seems training still completed. What Implications do these warnings have? (model quality?)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
"Over 500 senones never occur in the input data. This is normal for context-dependent untied senone training or for adaptation, but could indicate a serious problem otherwise"
How do I find out if it is a serious problem?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I had this problem while trying to run falignment>>>sphinx3_align:
s3_align.c:927: align_build_sent_hmm: Assertion `stail.predlist' failed.
I read a similar post on this problem dated in may last year, you suggested to
check for utf-8 bom coding in transcripts, but i think my problem might be
different cuz training was completed. Could you please suggest what to do?
Thank you.
Tony
Check utf-8 bom symbol in the beginning of the transcript. sphinx3 should
break on it. Trainer is more robust to such issues now and could work fine
despite your file has problems.
Then while doing the 20 training ci models, this error ERROR: "corpus.c", line
355: Must be at least one line in the control file occured. What could be the
cause of this error?
I'm gonna say all my thanks here for all my posts and your replies so you
don't have to open the thread just to read thanks from people. Thank you
Tony
Your previous stage (forced alignment) didn't align any files properly, all
failed and the control file is empty because of that. You need to check forced
alignment stage logs to find out what is wrong
You are welcome
I used Kate to detect Unicode coding, nothing. I also saved the transcript
again in kate (I was told it automaticly removes BOM). But I still have
Tony
Share your model folder
Those warning from the log should make you think. You can find int the
model_architecture/name.falign.ci_mdef that tmat 31 corresponds to the phone
SIL.
The SIL was trained incorrectly because you don't have
andin yourtranscription file. Each line in the transcription file must start with
.and must end with
You should run verification stage before the training. Verification state also
issued a warning for you about SIL. You should take care about the warnings on
verification stage before you proceed to training.
that's right, I removed the
andbecause in the training tutorial itsaid to do so
http://www.speech.cs.cmu.edu/sphinxman/scriptman1.html#04
So I should put them back to the transcript and retrain the model?
Tony
What could cause the ERROR:"main_align.c" line 765: Final state not reached:
no alignment for audiofile? How should I solve this? Due to this problem a few
senones were also not observed.
failed to open ~falignout/model.alignfiles at 20. ci_hmm at Baum_welch.pl line
134. Is it due to the above problems?
Tony
See
http://cmusphinx.sourceforge.net/wiki/tutorialam#troubleshooting
The file model.alignfiles wasn't created for some reason. For more information
you need to check training logs in logdir folder.
The senones are coded with numbers in the log. How can I find out which number
corresponds to which senone?
Should I just remove the corresponding phones and words from my files with
these senones?
Tony
Number is senone. If you are looking for corresponding context or for
corresponding central phoneme you can find this mapping inside mdef file.
You should better find out why the senones were not represented with enough
occurences in the training database. Either you need to train with less
senones or you need to drop rare phones.
I had fatal error during cd traing before with rare phones, so what I did was
find all the words containing these phones and turned them into ++words++
+rare_phone+ in all places and move them to filler dic, and cd training
completed. Now for falignment I moved these ++word++ from the filler dict to
dict according to the tutorial.The error log then says these modified
+rare_phone+ don't exit in the input data. I guess I should put them back into
the filler dic? If this is the case then what about the instruction for
falignment asking me to remove the fillers cuz faligner is not good at
inserting them?
Tony
p.s. shouldn't these ++word++ and +rare_phone+ be ignored during decision tree
building? Other filler phones in dict don't have the above problems.
"failed to open ~falignout/model.alignfiles at 20. ci_hmm at Baum_welch.pl
line 134. Is it due to the above problems? "
The file model.alignfiles wasn't created for some reason. For more information
you need to check training logs in logdir folder.
You are using some obsolete tutorial it seems. You can find all recent and up-
to-date information on our website
http://cmusphinx.sourceforge.net/wiki/tutorialam
They are ignored
After enabling everything to do with force alignment , I trained and the
training completed, does it mean falignement is also complete? Which acoustic
model should I use? What about the model.falign_ci_gaussian models?
It's strange that when I tried to train another model, it failed because the
libblas.so.3 doesn't exist. But how come I could train other models no
problems?
Tony
It's not related. You need to check logs and faligner folder contents for
details.
This question is covered in tutorial
Those are intermediate models used during the training
Maybe you installed libblas.so.3 already
No, it's unrelated.
They way I dropped rare phones are change the word containing the rare phones into a ++word++ filler with filler phone +rare_phone+, but this caused the senones were never observed in the input data warning? would this affect the acoustic model quality? How should I solve this?
After I added more training data, it caused more of the above warnings for phones that were observed before addition of data. Why?
How do I find out if it is a serious problem?
Haven't heard back from you. What do you think about the above problems?
Read the documentation about training stages and the meaning of this message
will become clear.