There was a discussion on exactly this subject a while ago. There is no
functionality equivalent to HParse directly in Kaldi (if that's what you
are looking for). What you can do is either generate the G.fst by
hand/script, as Nagendra suggested. Second way would be to figure out how
Thrax (http://openfst.cs.nyu.edu/twiki/bin/view/GRM/Thrax) could be used
for that purpose -- if you will pursue this way, let us know how to do it
as this would be great to have archived (at least) in the list.
y.
On Tue, Mar 17, 2015 at 9:31 AM, Nagendra Kumar Goel ngoel17@users.sf.net
wrote:
Yes. But you will have to compile your corresponding G.fst and words.txt
On Tue, Mar 17, 2015 at 5:02 AM, sprieto sprieto@users.sf.net wrote:
Hi,
We would like to use a grammar as LM for speake recognition. For example:
$digit = one | two | three | four | five | six | seven | eight | nine |
zero;
$number = $digit | $digit $digit | $digit $digit $digit;
Can Kaldi support this type of grammars?
Thanks in advance.
We tried to use Thrax to create the grammar but finally we decided to create it manually.
We have already a big recognizer with an acoustic model trained with 150 hours and big lexicon that contains 120000 words and it works very well. We also created manually G.fst of the new grammar. We would like to combine this G.fst with the acoustic model to obtain a new recognizer based in the new grammar.
Firstly we created HCLG.fst using all the files in data/lang of the big recognizer (including words.txt with 120000 words) and this new grammar and it works.
After that, we tried to create all the lang files (words.txt,L.fst,phones.txt...) with the words of the new grammar. We compiled the HCLG.fst and the results aren't good.
Even if you start with a huge wordlist, what you eventually use depends on
what is there in the grammar. So there I as no need to explicitly reduce
lexicon size. Just make sure it covers the words in grammar.
We tried to use Thrax to create the grammar but finally we decided to
create it manually.
We have already a big recognizer with an acoustic model trained with 150
hours and big lexicon that contains 120000 words and it works very well. We
also created manually G.fst of the new grammar. We would like to combine
this G.fst with the acoustic model to obtain a new recognizer based in the
new grammar.
Firstly we created HCLG.fst using all the files in data/lang of the big
recognizer (including words.txt with 120000 words) and this new grammar and
it works.
After that, we tried to create all the lang files
(words.txt,L.fst,phones.txt...) with the words of the new grammar. We
compiled the HCLG.fst and the results aren't good.
The only thing that needs to be exactly the same is the phones.txt. If
this is different then the numbering of the phones gets messed up and you
get nonsense.
Dan
On Thu, Mar 26, 2015 at 6:52 AM, Nagendra Kumar Goel ngoel17@users.sf.net
wrote:
Even if you start with a huge wordlist, what you eventually use depends on
what is there in the grammar. So there I as no need to explicitly reduce
lexicon size. Just make sure it covers the words in grammar.
Nagendra
On Mar 26, 2015 4:41 AM, "sprieto" sprieto@users.sf.net wrote:
Hi,
We tried to use Thrax to create the grammar but finally we decided to
create it manually.
We have already a big recognizer with an acoustic model trained with 150
hours and big lexicon that contains 120000 words and it works very well. We
also created manually G.fst of the new grammar. We would like to combine
this G.fst with the acoustic model to obtain a new recognizer based in the
new grammar.
Firstly we created HCLG.fst using all the files in data/lang of the big
recognizer (including words.txt with 120000 words) and this new grammar and
it works.
After that, we tried to create all the lang files
(words.txt,L.fst,phones.txt...) with the words of the new grammar. We
compiled the HCLG.fst and the results aren't good.
Hi,
We would like to use a grammar as LM for speaker recognition. For example:
$digit = one | two | three | four | five | six | seven | eight | nine | zero;
$number = $digit | $digit $digit | $digit $digit $digit;
Can Kaldi support this type of grammars?
Thanks in advance.
Last edit: sprieto 2015-03-17
Yes. But you will have to compile your corresponding G.fst and words.txt
On Tue, Mar 17, 2015 at 5:02 AM, sprieto sprieto@users.sf.net wrote:
There was a discussion on exactly this subject a while ago. There is no
functionality equivalent to HParse directly in Kaldi (if that's what you
are looking for). What you can do is either generate the G.fst by
hand/script, as Nagendra suggested. Second way would be to figure out how
Thrax (http://openfst.cs.nyu.edu/twiki/bin/view/GRM/Thrax) could be used
for that purpose -- if you will pursue this way, let us know how to do it
as this would be great to have archived (at least) in the list.
y.
On Tue, Mar 17, 2015 at 9:31 AM, Nagendra Kumar Goel ngoel17@users.sf.net
wrote:
Hi,
We tried to use Thrax to create the grammar but finally we decided to create it manually.
We have already a big recognizer with an acoustic model trained with 150 hours and big lexicon that contains 120000 words and it works very well. We also created manually G.fst of the new grammar. We would like to combine this G.fst with the acoustic model to obtain a new recognizer based in the new grammar.
Firstly we created HCLG.fst using all the files in data/lang of the big recognizer (including words.txt with 120000 words) and this new grammar and it works.
After that, we tried to create all the lang files (words.txt,L.fst,phones.txt...) with the words of the new grammar. We compiled the HCLG.fst and the results aren't good.
We also followed the steps from http://sourceforge.net/p/kaldi/discussion/1355348/thread/2539caaa/, changing the phones.txt,disambig.txt, nonsilence.txt, silence.txt with the files of the big recognizer. And the results are not good.
So, the only way we found to create a new recognizer based in the grammar needs the same wordlist of the big recognizer.
Any help?
Thanks in advance
Even if you start with a huge wordlist, what you eventually use depends on
what is there in the grammar. So there I as no need to explicitly reduce
lexicon size. Just make sure it covers the words in grammar.
Nagendra
On Mar 26, 2015 4:41 AM, "sprieto" sprieto@users.sf.net wrote:
The only thing that needs to be exactly the same is the phones.txt. If
this is different then the numbering of the phones gets messed up and you
get nonsense.
Dan
On Thu, Mar 26, 2015 at 6:52 AM, Nagendra Kumar Goel ngoel17@users.sf.net
wrote:
It works very good now. Thanks a lot.