Menu

Verifying that -fdict is working

Help
Halle
2011-06-15
2012-09-22
  • Halle

    Halle - 2011-06-15

    Hi Nickolay,

    I'm wondering if the -fdict behavior in OpenEars is working correctly and
    wanted to see if you could clarify the correct behavior for a working filler
    dictionary for me.

    I am trying to use the noisedict from hub4wsj_sc_8k as my filler dictionary. I
    am running Pocketsphinx with the argument -fdict and pointing to the location
    of hub4wsj_sc_8k/noisedict. I see that it is received as a command line
    argument and it appears in the Current Configuration log as:

    -fdict /correctpathto/noisedict

    without prompting any errors.

    If I run Pocketsphinx with the normal level of verbosity, I never see anything
    from the filler dictionary being recognized in the logs. This might be the
    correct behavior, to return (null) for a filler noise, not sure. I'm also
    getting a lot of reports of noises that are in the noisedict being recognized
    as words that are in the language model or grammar, so I've been wondering if
    this is the correct behavior (not ever seeing anything from the filler
    dictionary being recognized in the Pocketsphinx logging) or if something isn't
    working correctly with the filler dictionary as I've configured it. Is there
    any way I can verify for sure (besides coughing at it, which sometimes results
    in a null or sometimes results in a word being recognized in a large language
    model) that the filler dictionary is being used sometimes? Should I be
    expecting to see ++COUGH++ in the logging or in the hypotheses sometimes?

    As a follow-up question, are there more elaborate filler dictionaries that can
    be used with hub4wsj_sc_8k? Perhaps the entries in the filler dictionary are
    being heard and disregarded (and returning null) but a larger filler
    dictionary would do a better job of throwing off more varied background
    noises. Are there instructions or examples for what kind of noises can be
    added to a larger hub4wsj_sc_8k filler dictionary or is it already optimal?

    Thanks,

    Halle

     
  • Nickolay V. Shmyrev

    Is there any way I can verify for sure (besides coughing at it, which
    sometimes results in a null or sometimes results in a word being recognized in
    a large language model) that the filler dictionary is being used sometimes?

    Run with "-fwdflat no -bestpath no -backtrace yes -fillprob 1.0" and see
    something like

    INFO:   ++BREATH++           254   257   1.000 -441       0          2  
    INFO:   ++SMACK++            258   275   1.000 -1742      0          2
    

    so I've been wondering if this is the correct behavior (not ever seeing
    anything from the filler dictionary being recognized in the Pocketsphinx
    logging)

    This is an expected behavior not a correct one. It's expected because you
    don't optimize rejection of the out-of-grammar result.

    As for fillers, they shouldn't be visible in decoder output since they are
    internal thing for the decoder. API user shouldn't care about fillers at all.
    Decoder should return NULL if only filler present in the utterance.

    there more elaborate filler dictionaries that can be used with
    hub4wsj_sc_8k?

    Filler dictionary just describes filler phonemes in the acoustic models. You
    can't change the dictionary unless you change the acoustic model.

     
  • Halle

    Halle - 2011-06-16

    Hiya,

    Thanks for the verification method -- that showed that the filler dictionary
    is working as expected. A couple more questions.

    This is an expected behavior not a correct one. It's expected because you
    don't optimize rejection of the out-of-grammar result.

    This is true. However, I don't know anything about what language models
    OpenEars users are running with so it isn't clear to me how I would offer more
    optimal OOV rejection as a library feature. Since the last version they can
    programmatically create ARPA models on the fly during their app session and
    switch between models in the middle of the continuous loop. So, it's difficult
    to even say in the docs "if you're using this kind of model, flip this switch
    for better OOV rejection, if you're using this kind of model, flip this one"
    since the odds are pretty good that they are going to have their app start
    with a more generalized model and then create contextually-useful smaller ones
    and swap them in and out.

    I've looked into improving OOV rejection and I saw this FAQ entry:

    Q: Can pocketsphinx reject out-of-grammar words

    There are few ways to deal with OOV rejection, for more details see Rejecting
    Out-of-Grammar Utterances. Situation with implementation of those approaches
    is:

    Garbage Models - requires you to train special model. There is no public model
    with garbage phones which can reject OOV words now. There are models with
    fillers, but they reject only specific sounds (breath, laught, um). They can't
    reject OOV word.
    Generic Word Model - same as above, requires you to train special model. There
    are no public models yet.
    Confidence Scores - confidence score (ps_get_prob) can be reliably calculated
    only for a large vocabulary (> 100 words). It doens't work with small grammar.
    There are approaches with phone-based confidence and one of them was
    implemented in sphixn2, but pocketsphinx doesn't support them. Confidence
    scoring also require you to have three-pass recognition (enable both fwdflat
    and bestpath).
    So for now recommendation for rejection with the small grammar is - train your
    own model (make it public). For the large language model (> 100 words) use
    confidence score.

    So, regarding the options listed:

    1. I don't have a garbage model to offer but there is the included filler dictionary with the hmm which is apparently working as expected.
    2. This looks too specific to any particular language model to be applicable to OpenEars.
    3. I am returning confidence scores in the OpenEars hypothesis-received callback and I've started to attempt to give the developers who use OpenEars some advice on the use of the scores.

    Is there another opportunity to improve OOV rejection that is generalized
    enough for a framework that I'm missing? Can you make any suggestions here?

    My last question is about your argument -fillprob 1.0. I see in the help that
    this is "Filler word transition probability, defaults to 1e-8". Looking at my
    Pocketsphinx logging I see that I'm not overriding this tiny number anywhere.
    For the users who are reporting that noises which are present in the filler
    dictionary are being recognized as words present in their language model, will
    it be beneficial for them to increase -fillprob, and if so can you recommend a
    number for them to start with (with the understanding that they will probably
    have to tweak the particular value in accordance with their needs and test
    results)?

    Thanks again,

    Halle

     
  • Nickolay V. Shmyrev

    Is there another opportunity to improve OOV rejection that is generalized
    enough for a framework that I'm missing? Can you make any suggestions here?

    I wrote this answer on a wiki page so I don't think I have anything to add to
    it ;) For your type of application you need to implement specific algorithm to
    calculate proper confidence score. It's some serious work.

    For the users who are reporting that noises which are present in the filler
    dictionary are being recognized as words present in their language model, will
    it be beneficial for them to increase -fillprob, and if so can you recommend a
    number for them to start with (with the understanding that they will probably
    have to tweak the particular value in accordance with their needs and test
    results)?

    I don't think users will be able to tweak this probability properly. Without a
    speech database it's hard to find out what is the best value for any of the
    decoder parameters.

     
  • Halle

    Halle - 2011-06-16

    I wrote this answer on a wiki page so I don't think I have anything to add
    to it ;)

    This page:

    http://cmusphinx.sourceforge.net/wiki/sphinx4:rejectionhandling

    ?

    When I read that I assumed it was just about Sphinx 4 since it is in the
    Sphinx 4 section of the wiki. Are we talking about this part:

    "Use confidence scores which are calculated post-recognition. This usually
    makes use of the word lattice from the decoding. For each word in the best
    hypothesis, we form a set of feature vectors by concatenating one or more
    basic features related to word confidence. Examples of such features include
    (but are not limited to): average acoustic score, average language score, word
    length in frames, word length in phones, the number of occurrence of the same
    word at the same location of the 10-best results, etc.. Such a feature vector
    is then scored against a trained vector to determine whether the word is out-
    of-vocabulary."

    Would such an algorithm actually be language-model independent (i.e. is that
    something that can be developed as a framework feature without any idea of
    what kind of language model the user is going to generate or add)?

     
  • Nickolay V. Shmyrev

    Would such an algorithm actually be language-model independent (i.e. is that
    something that can be developed as a framework feature without any idea of
    what kind of language model the user is going to generate or add)?

    yes

     

Log in to post a comment.