I'm trying to train acoustic models for a project. However, I cannot find a
site that could help me in the general format of the transcription file.
I know that punctuation marks such as periods, exclamation points and question
marks are not allowed. But what other punctuation symbols are not allowed?
Also, should numbers be converted into their corresponding words?
Lastly, what should be done with sentences that are quoted, such as
"I'm ok.", she replied.
Thanks. Your help would be greatly appreciated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think NO punctuation marks are allowed. The reason is that ALL the words in
the transcript (except , ) must be represented in the dictionary in
"word -- corresponding phones" format.
I suggest to remove the quotes, commas etc
Also, should numbers be converted into their corresponding words?
In sphinx, it is not allowed to write 1, 2, 3 .. as words in dictionary. So
you'll have to replace them with 'spelling' .
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm trying to train acoustic models for a project. However, I cannot find a
site that could help me in the general format of the transcription file.
I know that punctuation marks such as periods, exclamation points and question
marks are not allowed. But what other punctuation symbols are not allowed?
Also, should numbers be converted into their corresponding words?
Lastly, what should be done with sentences that are quoted, such as
"I'm ok.", she replied.
Thanks. Your help would be greatly appreciated.
Hi,
Check out http://cmusphinx.sourceforge.net/wiki/tutorialam for a detailed procedure.
I think NO punctuation marks are allowed. The reason is that ALL the words in
the transcript (except
,) must be represented in the dictionary in"word -- corresponding phones" format.
I suggest to remove the quotes, commas etc
In sphinx, it is not allowed to write 1, 2, 3 .. as words in dictionary. So
you'll have to replace them with 'spelling' .