Menu

Transcription File Format

Help
mdeala
2011-07-06
2012-09-22
  • mdeala

    mdeala - 2011-07-06

    I'm trying to train acoustic models for a project. However, I cannot find a
    site that could help me in the general format of the transcription file.

    I know that punctuation marks such as periods, exclamation points and question
    marks are not allowed. But what other punctuation symbols are not allowed?

    Also, should numbers be converted into their corresponding words?

    Lastly, what should be done with sentences that are quoted, such as

    "I'm ok.", she replied.

    Thanks. Your help would be greatly appreciated.

     
  • Pranav Jawale

    Pranav Jawale - 2011-07-07

    Hi,

    Check out http://cmusphinx.sourceforge.net/wiki/tutorialam for a detailed procedure.

    I think NO punctuation marks are allowed. The reason is that ALL the words in
    the transcript (except , ) must be represented in the dictionary in
    "word -- corresponding phones" format.

    I suggest to remove the quotes, commas etc

    Also, should numbers be converted into their corresponding words?

    In sphinx, it is not allowed to write 1, 2, 3 .. as words in dictionary. So
    you'll have to replace them with 'spelling' .

     

Log in to post a comment.