I've been working on training an acoustic model from TIMIT corpus. I have
followed the instruction from CMU Sphinx wiki page that talks about building
an acoustic model. I have got to the point where I have obtained all the
necessary files and started running RunAll.pl
However, I receive the warnings that similar to the line below:
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
the phonelist, and all phones in the phonelist appear at least once
WARNING: This phone (AA) occurs in the phonelist
(/home/frostshoxx/Desktop/tutorial/myTIMIT/etc/myTIMIT.phone), but not in any
word in the transcription
(/home/frostshoxx/Desktop/tutorial/myTIMIT/etc/myTIMIT_train.transcription)
There are also other phones as well. I double check into the dictionary file
(myTIMIT.dic) and myTIMIT_train.transcription. I'm pretty sure that these
phones are located in my transcription and dictionary.
I.e.
in Dictionary
ACCOMPLISHED AH K AA M P L IH SH T
in transcription
AMBIDEXTROUS PICKPOCKETS ACCOMPLISH MORE (TRAIN/DR2/MARC0/SX378)
The dictionary file I used come from the webservice lmtool on CMU site that I
get from the wiki page talking about building a language model. I notice that
in the file there are some inconsistent for spacing like below
ABOUT AH B AW T
ABOVE AH B AH V
ABRUPTLY AH B R AH P T L IY
ABSENCES, AE B S AH N S IH Z
Does this matter at all whether the phones are a few spaces away from the word
differently from other words?
Thank you for the help in advance. I appreciate your patience guiding a
beginner like me.
Regards,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Does this matter at all whether the phones are a few spaces away from the
word differently from other words?
It depends on SphinxTrain version you are using. It shouldn't matter in latest
snapshot.
Anyway, it's very easy to check them - organize spaces properly. And it's not
just about dictionary. For example you transcription file has double spaces it
seems. It's not a good thing to do.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello everyone,
I've been working on training an acoustic model from TIMIT corpus. I have
followed the instruction from CMU Sphinx wiki page that talks about building
an acoustic model. I have got to the point where I have obtained all the
necessary files and started running RunAll.pl
However, I receive the warnings that similar to the line below:
There are also other phones as well. I double check into the dictionary file
(myTIMIT.dic) and myTIMIT_train.transcription. I'm pretty sure that these
phones are located in my transcription and dictionary.
I.e.
in Dictionary
in transcription
The dictionary file I used come from the webservice lmtool on CMU site that I
get from the wiki page talking about building a language model. I notice that
in the file there are some inconsistent for spacing like below
Does this matter at all whether the phones are a few spaces away from the word
differently from other words?
Thank you for the help in advance. I appreciate your patience guiding a
beginner like me.
Regards,
It depends on SphinxTrain version you are using. It shouldn't matter in latest
snapshot.
Anyway, it's very easy to check them - organize spaces properly. And it's not
just about dictionary. For example you transcription file has double spaces it
seems. It's not a good thing to do.