Menu

Large (1000 command) JSGF Problems

Help
P Laurens
2010-08-26
2012-09-22
  • P Laurens

    P Laurens - 2010-08-26

    I have a JSGF model working, and the control is provides is great compared to
    the lm model (which I imagine is more suited to dictation), however, I have an
    issue regarding a large (~1000) set of commands that I want recognised.

    My grammar is currently as simple as can be, a single category with around
    1000 utterances I'd like to be distinguished between. My dictionary file is
    similarly large (larger even, as there are alternate word pronunciations).

    This means that at startup, I am getting a minute or two delay while I see the
    following happen in the console:

    ...
    INFO: fsg_model.c(358): Added 2 alternate word transitions
    INFO: fsg_model.c(325): Adding alternate word transitions (DENTISTS(4),DENTISTS(3)) to FSG
    ...
    

    Secondly, recognition takes a great deal longer (a couple of seconds, compared
    to almost instantaneous for an lm model).

    And finally, the word returned includes the alternative pronunciation number
    e.g. "RESTAURANT(2)", in the recognised string, whereas the lm model did not
    do this.

    What I want is to be able to restrict a model to only recognise 500-1000
    phrases, and not be able to mix and match words from each phrase in the corpus
    (which lowers the accuracy for me).

    I am using PocketSphinx (wrapped by VocalKit), and my config file looks like
    this:

    -fwdflat no
    -bestpath no
    -nfft 512 
    -lowerf 1
    -upperf 4000
    -samprate 8000
    -nfilt 20
    -transform dct
    -round_filters no
    -remove_dc yes
    

    Any assistance on achieving this, or otherwise overcoming the performance
    issues I see with the JSGF model, would be very gratefully received,

    Thanks!

    • P
     
  • Nickolay V. Shmyrev

    , and the control is provides is great compared to the lm model

    This is the case where control is dangerous :)

    My grammar is currently as simple as can be, a single category with around
    1000 utterances I'd like to be distinguished between.

    That doesn't sound like a proper application design. It should be reviewed.
    What exactly application are you trying to build and why do you need
    distinguish between 1000 variants?

    This means that at startup, I am getting a minute or two delay while I see
    the following happen in the console

    There was performance issue fixed in pocketsphinx trunk. If you are using
    pocketsphinx-0.6 it definitely make sense to upgrade to latest version.

    Secondly, recognition takes a great deal longer

    There are ways to speedup the recognizer described in wiki, they are tradeoffs
    for accuracy though

    -fwdflat no -bestpath no

    If your vocabulary is rather big, it's not a good thign to do. Both fwdflat
    and bestpath improve accuracy

    rd returned includes the alternative pronunciation number e.g.
    "RESTAURANT(2)"

    This is a bug if lm indeed returns just words. It needs to be fixed.

     
  • P Laurens

    P Laurens - 2010-08-26

    Thanks for your input, it's already very helpful.

    To explain my requirements further, I have a number of category titles that
    I'd like recognised - for example 'Restaurant', 'Bar', 'Pet Shop'. There are
    between 200 and 500 of these that I'd like recognised. My list balloons
    further however, because I have added plurals to the corpus (e.g.
    'Restaurants', 'Bars', 'Pet Shops'), as I'm not sure how the user will choose
    how to say a category, so this balloons the corpus, and subsequently the
    grammar.

    Essentially the list is a relatively full list of things you might expect to
    find on a high-street.

    I don't actually have a requirement for a grammar at all, these category
    phrases should be self contained, and won't be joined in a longer grammar of
    other phrases.

    I started off with an lm model, which actually had very good accuracy, except
    for the fact that it would feel free to mix and match words from different
    phrases, so I might have two phrases 'Cat Groomers' and 'Italian Restaurants',
    and yet it would be possible to recognise 'Cat Restaurants' or 'Italian
    Groomers' which is something I want to prevent.

    In order to try and get this, it was suggested I try a JSGF grammar, which is
    where I'm at at the moment.

    As for your comments on performance, I'll definitely take action based on
    that, thanks for the advice.

    • P
     
  • Nickolay V. Shmyrev

    In order to try and get this, it was suggested I try a JSGF grammar, which
    is where I'm at at the moment.

    Ok, let it be so this way. It's also benifical to analyse n-best lists from
    the decoder to get more accurate recogntion results

    As for your comments on performance, I'll definitely take action based on
    that, thanks for the advice.

    Please do. Latest snapshot should be faster.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.