Kaldi / Discussion / Help: Grammar as Lenguage Model

sprieto - 2015-03-17

Hi,

We would like to use a grammar as LM for speaker recognition. For example:

$digit = one | two | three | four | five | six | seven | eight | nine | zero;
$number = $digit | $digit $digit | $digit $digit $digit;

Can Kaldi support this type of grammars?

Thanks in advance.

Last edit: sprieto 2015-03-17

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nagendra Kumar Goel - 2015-03-17
  
  Yes. But you will have to compile your corresponding G.fst and words.txt
  
  On Tue, Mar 17, 2015 at 5:02 AM, sprieto sprieto@users.sf.net wrote:
  
  Hi,
  
  We would like to use a grammar as LM for speake recognition. For example:
  
  $digit = one | two | three | four | five | six | seven | eight | nine |
  zero;
  $number = $digit | $digit $digit | $digit $digit $digit;
  
  Can Kaldi support this type of grammars?
  
  Thanks in advance.
  
  Grammar as Lenguage Model
  https://sourceforge.net/p/kaldi/discussion/1355348/thread/3309b68a/?limit=25#78ba
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/kaldi/discussion/1355348/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Jan "yenda" Trmal - 2015-03-17
    
    There was a discussion on exactly this subject a while ago. There is no
    functionality equivalent to HParse directly in Kaldi (if that's what you
    are looking for). What you can do is either generate the G.fst by
    hand/script, as Nagendra suggested. Second way would be to figure out how
    Thrax (http://openfst.cs.nyu.edu/twiki/bin/view/GRM/Thrax) could be used
    for that purpose -- if you will pursue this way, let us know how to do it
    as this would be great to have archived (at least) in the list.
    y.
    
    On Tue, Mar 17, 2015 at 9:31 AM, Nagendra Kumar Goel ngoel17@users.sf.net
    wrote:
    
    Yes. But you will have to compile your corresponding G.fst and words.txt
    
    On Tue, Mar 17, 2015 at 5:02 AM, sprieto sprieto@users.sf.net wrote:
    
    Hi,
    
    We would like to use a grammar as LM for speake recognition. For example:
    
    $digit = one | two | three | four | five | six | seven | eight | nine |
    zero;
    $number = $digit | $digit $digit | $digit $digit $digit;
    
    Can Kaldi support this type of grammars?
    Thanks in advance.
    
    Grammar as Lenguage Model
    
    https://sourceforge.net/p/kaldi/discussion/1355348/thread/3309b68a/?limit=25#78ba
    
    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/kaldi/discussion/1355348/
    
    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/
    
    Grammar as Lenguage Model
    http://sourceforge.net/p/kaldi/discussion/1355348/thread/3309b68a/?limit=25#78ba/f1c5
    
    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/kaldi/discussion/1355348/
    
    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

sprieto - 2015-03-26

Hi,

We tried to use Thrax to create the grammar but finally we decided to create it manually.

We have already a big recognizer with an acoustic model trained with 150 hours and big lexicon that contains 120000 words and it works very well. We also created manually G.fst of the new grammar. We would like to combine this G.fst with the acoustic model to obtain a new recognizer based in the new grammar.

Firstly we created HCLG.fst using all the files in data/lang of the big recognizer (including words.txt with 120000 words) and this new grammar and it works.

After that, we tried to create all the lang files (words.txt,L.fst,phones.txt...) with the words of the new grammar. We compiled the HCLG.fst and the results aren't good.

We also followed the steps from http://sourceforge.net/p/kaldi/discussion/1355348/thread/2539caaa/, changing the phones.txt,disambig.txt, nonsilence.txt, silence.txt with the files of the big recognizer. And the results are not good.

So, the only way we found to create a new recognizer based in the grammar needs the same wordlist of the big recognizer.

Any help?

Thanks in advance

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nagendra Kumar Goel - 2015-03-26
  
  Even if you start with a huge wordlist, what you eventually use depends on
  what is there in the grammar. So there I as no need to explicitly reduce
  lexicon size. Just make sure it covers the words in grammar.
  
  Nagendra
  On Mar 26, 2015 4:41 AM, "sprieto" sprieto@users.sf.net wrote:
  
  Hi,
  
  We tried to use Thrax to create the grammar but finally we decided to
  create it manually.
  
  We have already a big recognizer with an acoustic model trained with 150
  hours and big lexicon that contains 120000 words and it works very well. We
  also created manually G.fst of the new grammar. We would like to combine
  this G.fst with the acoustic model to obtain a new recognizer based in the
  new grammar.
  
  Firstly we created HCLG.fst using all the files in data/lang of the big
  recognizer (including words.txt with 120000 words) and this new grammar and
  it works.
  
  After that, we tried to create all the lang files
  (words.txt,L.fst,phones.txt...) with the words of the new grammar. We
  compiled the HCLG.fst and the results aren't good.
  
  We also followed the steps from
  http://sourceforge.net/p/kaldi/discussion/1355348/thread/2539caaa/,
  changing the phones.txt,disambig.txt, nonsilence.txt, silence.txt with the
  files of the big recognizer. And the results are not good.
  
  So, the only way we found to create a new recognizer based in the grammar
  needs the same wordlist of the big recognizer.
  
  Any help?
  
  Thanks in advance
  
  Grammar as Lenguage Model
  https://sourceforge.net/p/kaldi/discussion/1355348/thread/3309b68a/?limit=25#8ebb
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/kaldi/discussion/1355348/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Daniel Povey - 2015-03-26
    
    The only thing that needs to be exactly the same is the phones.txt. If
    this is different then the numbering of the phones gets messed up and you
    get nonsense.
    Dan
    
    On Thu, Mar 26, 2015 at 6:52 AM, Nagendra Kumar Goel ngoel17@users.sf.net
    wrote:
    
    Even if you start with a huge wordlist, what you eventually use depends on
    what is there in the grammar. So there I as no need to explicitly reduce
    lexicon size. Just make sure it covers the words in grammar.
    
    Nagendra
    On Mar 26, 2015 4:41 AM, "sprieto" sprieto@users.sf.net wrote:
    
    Hi,
    
    We tried to use Thrax to create the grammar but finally we decided to
    create it manually.
    
    We have already a big recognizer with an acoustic model trained with 150
    hours and big lexicon that contains 120000 words and it works very well. We
    also created manually G.fst of the new grammar. We would like to combine
    this G.fst with the acoustic model to obtain a new recognizer based in the
    new grammar.
    
    Firstly we created HCLG.fst using all the files in data/lang of the big
    recognizer (including words.txt with 120000 words) and this new grammar and
    it works.
    
    After that, we tried to create all the lang files
    (words.txt,L.fst,phones.txt...) with the words of the new grammar. We
    compiled the HCLG.fst and the results aren't good.
    
    We also followed the steps from
    http://sourceforge.net/p/kaldi/discussion/1355348/thread/2539caaa/,
    changing the phones.txt,disambig.txt, nonsilence.txt, silence.txt with the
    files of the big recognizer. And the results are not good.
    
    So, the only way we found to create a new recognizer based in the grammar
    needs the same wordlist of the big recognizer.
    
    Any help?
    Thanks in advance
    
    Grammar as Lenguage Model
    
    https://sourceforge.net/p/kaldi/discussion/1355348/thread/3309b68a/?limit=25#8ebb
    
    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/kaldi/discussion/1355348/
    
    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/
    
    Grammar as Lenguage Model
    http://sourceforge.net/p/kaldi/discussion/1355348/thread/3309b68a/?limit=25#8ebb/1d89
    
    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/kaldi/discussion/1355348/
    
    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

sprieto - 2015-03-27

It works very good now. Thanks a lot.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Grammar as Lenguage Model

Forums

Help

Grammar as Lenguage Model document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Thanks in advance.

https://sourceforge.net/p/kaldi/discussion/1355348/thread/3309b68a/?limit=25#78ba

Thanks in advance

https://sourceforge.net/p/kaldi/discussion/1355348/thread/3309b68a/?limit=25#8ebb

Grammar as Lenguage Model