Menu

PS multipass decoding

Help
2011-01-21
2012-09-22
  • Vassil Panayotov

    Hi,

    in "pocketsphinx performance problem on CD model" thread (https://sourceforge
    .net/projects/cmusphinx/forums/forum/5471/topic/4053314)
    Nickolay said that
    enabling "fwdflat" decoding after "fwdtree" helps with both accuracy and
    performance. I didn't find documentation on these two decoding modes except
    for the description of the command line options in PS executables and man
    pages. These descriptions say that "fwdtree" searches using lextree and this
    is the first pass and "fwdflat" then searches through a word lattice
    (apparently constructed by fwdtree) and this is the second pass.
    So my question is: given the time needed for fwdflat is non-negative how it is
    possible fwdtree + fwdflat to be faster than fwdtree-only search? What am I
    missing?
    By the way what 'fwd' prefix stands for - 'forward'?

    The description of 'bestpath' option says that it is used for "Dijkstra search
    over word lattice(3rd pass)". How this fits in the overall picture?

    Thank you!

     
  • Nickolay V. Shmyrev

    These descriptions say that "fwdtree" searches using lextree and this is the
    first pass and "fwdflat" then searches through a word lattice (apparently
    constructed by fwdtree) and this is the second pass.

    Not exactly like that. Fwdtree is lextree search (where all the words are
    compressed into the tree, lm probs applied on the word end). Fwdflat is the
    search using vocabulary generated by fwdtree with flat structure (no
    compression of words, lm prob applied on start). Then there is bestpath search
    on the lattice constructed by fwdflat. There was an intention to replace
    fwdflat with lattice-based search but it was never implemented. The details of
    the search algorithms are described in various papers, for example Ph.D thesis
    of Ravi "Efficient algorithms for speech recognition".

    So my question is: given the time needed for fwdflat is non-negative how it
    is possible fwdtree + fwdflat to be faster than fwdtree-only search? What am I
    missing?

    If we fix the speed say 1xRT and try to reach best accuracy then accuracy from
    0.8RT Lextree + 0.2 Flat should be better than just 1.0 lextree.

    By the way what 'fwd' prefix stands for - 'forward'?

    Yes

    The description of 'bestpath' option says that it is used for "Dijkstra
    search over word lattice(3rd pass)". How this fits in the overall picture?

    Lattice is constructed by fwdflat and bestpath searches it.

     
  • Vassil Panayotov

    Not exactly like that. Fwdtree is lextree search (where all the words are
    compressed into the tree, lm probs applied on the word end). Fwdflat is the
    search using vocabulary generated by fwdtree with flat structure (no
    compression of words, lm prob applied on start). Then there is bestpath search
    on the lattice constructed by fwdflat. There was an intention to replace
    fwdflat with lattice-based search but it was never implemented. The details of
    the search algorithms are described in various papers, for example Ph.D thesis
    of Ravi "Efficient algorithms for speech recognition".

    Thank you very much for the reference!
    I will definitely read at least the parts of the thesis that are of interest
    for me. Can you please point out whether there are point where PS
    significantly diverges from the design outlined in the paper?
    In "Spoken Language Processing" the authors talk about something called
    factored LM lextrees (I don't recall the exact terminology) where the most
    optimistic of the children nodes' LM probabilities are attached to the subtree
    roots. The intention was to weed out the most unpromising hypothesis as early
    as possible, while not sacrificing accuracy. Is there something like this in
    PS or LM probs are always attached to the leafs?

    If we fix the speed say 1xRT and try to reach best accuracy then accuracy
    from 0.8RT Lextree + 0.2 Flat should be better than just 1.0 lextree.

    Got it. Figure 4.15 from the paper seems to confirm this too :)

     
  • Nickolay V. Shmyrev

    Can you please point out whether there are point where PS significantly
    diverges from the design outlined in the paper?

    Pocketsphinx has algorithmic optimizations to reduce memory footprint and
    computation. For example, only CI phones are used for single-phone words. The
    effort to port sphinx3 search to pocketsphinx was started (see lextree.c in
    trunk for example), but it's not finished yet. It's rather simplistic actually
    but unfortunately I'm not competent enough to talk about details :)

    It's better to wait for David's PhD thesis to read everything :)

    In "Spoken Language Processing" the authors talk about something called
    factored LM lextrees (I don't recall the exact terminology) where the most
    optimistic of the children nodes' LM probabilities are attached to the subtree
    roots. The intention was to weed out the most unpromising hypothesis as early
    as possible, while not sacrificing accuracy. Is there something like this in
    PS or LM probs are always attached to the leafs?

    PS doesn't use such thing as far as I understand. In sphinx3 and sphinx4 it's
    used. In sphinx4 it's called "smear", it's actually quite important for
    accuracy.

     

Log in to post a comment.