Recently, I was doing something to improve recognition speed. I know in HTK, there is word net to strict the decoding process in a limit path but I cannot found similar thing in pocketsphinx, does it exist? It seem the N-gram (up to 3)cannot do the same thing.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I searched the information about "fsg" and noticed the following description(http://cmusphinx.sourceforge.net/sphinx2/doc/sphinx2.html#sec_fsgfmt)
"Note that the current implementation of finite-state grammars, or FSGs, is not the most efficient. In particular, transitions are represented using a full NxN matrix, where N is the number of states. Hence, FSGs containing several thousands of states may run inefficiently."
My database is about 900 utterances, and each utterance has about 6~10 words. So if use "fsg", the state numbers will be a enormous figure.
Actually , my requirement is simple. For example, My database is 5 utterances:
1. Hello
2. What can I do for you?
3. how can I go to people square in shanghai?
4. can I help you?
5. where can I watch a movie?
If I used n-gram(trigram) language model, I can only strict the relationship between 3 words. So I maybe get the following recognition result "how can I do for people square in shanghai". This sentence doesn't exist in my database at all!
In HTK, it provides word net generating a lattice to ensure all decoding process only in the above 5 sentences, and will never appear 6th sentence.
How can I do with sphinx? Thanks a lot!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
TRANSITION 0 1 1.0
TRANSITION 1 2 1.0 HI
TRANSITION 2 3 1.0 GOOD
TRANSITION 3 4 1.0 MORNING
TRANSITION 1 4 1.0 HELLO
TRANSITION 1 6 1.0 HELLO
TRANSITION 6 7 1.0 MY
TRANSITION 7 8 1.0 NAME
TRANSITION 8 4 1.0 IS
TRANSITION 4 5 1.0
FSG_END
There's 9 states in this "fsg" file. But As I mentioned above, I have 900 sentences. If I wrote "fsg" file for it manually, it is a heavy job and very easy to make a mistake. Is there any toolkit to generate "fsg" file automatically by input text? Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
i think i found the solution. There are 2 methods to resolve this problem.
1. use "-jsgf" argument in pocket
2. use sphinx_jsgf2fsg function in sphinxbase to compile JSGF file to fsg format
Recently, I was doing something to improve recognition speed. I know in HTK, there is word net to strict the decoding process in a limit path but I cannot found similar thing in pocketsphinx, does it exist? It seem the N-gram (up to 3)cannot do the same thing.
> but I cannot found similar thing in pocketsphinx, does it exist?
You can use jsgf grammar with -jsgf option or fsg grammar with -fsg option. They both are finite state grammars.
I searched the information about "fsg" and noticed the following description(http://cmusphinx.sourceforge.net/sphinx2/doc/sphinx2.html#sec_fsgfmt)
"Note that the current implementation of finite-state grammars, or FSGs, is not the most efficient. In particular, transitions are represented using a full NxN matrix, where N is the number of states. Hence, FSGs containing several thousands of states may run inefficiently."
My database is about 900 utterances, and each utterance has about 6~10 words. So if use "fsg", the state numbers will be a enormous figure.
Actually , my requirement is simple. For example, My database is 5 utterances:
1. Hello
2. What can I do for you?
3. how can I go to people square in shanghai?
4. can I help you?
5. where can I watch a movie?
If I used n-gram(trigram) language model, I can only strict the relationship between 3 words. So I maybe get the following recognition result "how can I do for people square in shanghai". This sentence doesn't exist in my database at all!
In HTK, it provides word net generating a lattice to ensure all decoding process only in the above 5 sentences, and will never appear 6th sentence.
How can I do with sphinx? Thanks a lot!
This is easily described by jsgf grammar as I wrote before.
Hi, today I tried the "fsg" argument and it do work well as my purpose, thanks a lot!
But I have another problem caused by "fsg". Now my database for test just has 3 sentences, such as:
1.HELLO
2.HELLO, MY NAME IS
3.HI, GOOD MORNING
The "fsg" file can be wrote as the following format:
FSG_BEGIN GREETING
NUM_STATES 9
START_STATE 0
FINAL_STATE 5
Transitions
TRANSITION 0 1 1.0
TRANSITION 1 2 1.0 HI
TRANSITION 2 3 1.0 GOOD
TRANSITION 3 4 1.0 MORNING
TRANSITION 1 4 1.0 HELLO
TRANSITION 1 6 1.0 HELLO
TRANSITION 6 7 1.0 MY
TRANSITION 7 8 1.0 NAME
TRANSITION 8 4 1.0 IS
TRANSITION 4 5 1.0
FSG_END
There's 9 states in this "fsg" file. But As I mentioned above, I have 900 sentences. If I wrote "fsg" file for it manually, it is a heavy job and very easy to make a mistake. Is there any toolkit to generate "fsg" file automatically by input text? Thanks!
i think i found the solution. There are 2 methods to resolve this problem.
1. use "-jsgf" argument in pocket
2. use sphinx_jsgf2fsg function in sphinxbase to compile JSGF file to fsg format
please refer to : http://java.sun.com/products/java-media/speech/forDevelopers/JSGF/index.html