Morph models

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Morph models

Forum: Help

Creator: vijayabharadwaj gsr

Created: 2012-03-30

Updated: 2012-09-22

vijayabharadwaj gsr - 2012-03-30

Dear Sir,

I am working on morph based language models for speech recognition. I am using
Varikn tool kit to build language models.

http://forge.pascal-
network.org/frs/download.php/45/varikn-1.0.2.tar.gz

In that they have given

The sentence start "~~" and end "~~" should be put marked to the training
data by the user.
For sub-word models, the tag "<w>" is reserved to signify word break.
For sub-word models with sentence breaks the data is assumed to processed in
the following format: </w>

~~<w> w1-1 w1-2 w1-3 <w> w2-1 <w> w3-1 w3-2 <w> </w></w></w></w>~~

where wA-B is the Bth part of the A:th word.

But I have a problem. What should be the dictionary format of sphinx4
recognizer i am confused. Also, to appear word boundary in the sphinx out put
what should i do.

Is there any way to fix this problem. Please let me know

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2012-04-07

What should be the dictionary format of sphinx4 recognizer i am confused.
Also, to appear word boundary in the sphinx out put what should i do.

Subword dictionary should still contain the mapping from subwords to phones

Word boundary is not readily supported by sphinx4. You will have to modify the
search algorithm to incorporate that. Basically efficient recognition
recognition using subword models needs some work.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.