Menu

Morph models

Help
2012-03-30
2012-09-22
  • vijayabharadwaj gsr

    Dear Sir,

    I am working on morph based language models for speech recognition. I am using
    Varikn tool kit to build language models.

    http://forge.pascal-
    network.org/frs/download.php/45/varikn-1.0.2.tar.gz

    In that they have given

    The sentence start "" and end "" should be put marked to the training
    data by the user.
    For sub-word models, the tag "<w>" is reserved to signify word break.
    For sub-word models with sentence breaks the data is assumed to processed in
    the following format: </w>

    <w> w1-1 w1-2 w1-3 <w> w2-1 <w> w3-1 w3-2 <w> </w></w></w></w>

    where wA-B is the Bth part of the A:th word.

    But I have a problem. What should be the dictionary format of sphinx4
    recognizer i am confused. Also, to appear word boundary in the sphinx out put
    what should i do.

    Is there any way to fix this problem. Please let me know

     
  • Nickolay V. Shmyrev

    What should be the dictionary format of sphinx4 recognizer i am confused.
    Also, to appear word boundary in the sphinx out put what should i do.

    Subword dictionary should still contain the mapping from subwords to phones

    Word boundary is not readily supported by sphinx4. You will have to modify the
    search algorithm to incorporate that. Basically efficient recognition
    recognition using subword models needs some work.

     

Log in to post a comment.