in "pocketsphinx performance problem on CD model" thread (https://sourceforge
.net/projects/cmusphinx/forums/forum/5471/topic/4053314) Nickolay said that
enabling "fwdflat" decoding after "fwdtree" helps with both accuracy and
performance. I didn't find documentation on these two decoding modes except
for the description of the command line options in PS executables and man
pages. These descriptions say that "fwdtree" searches using lextree and this
is the first pass and "fwdflat" then searches through a word lattice
(apparently constructed by fwdtree) and this is the second pass.
So my question is: given the time needed for fwdflat is non-negative how it is
possible fwdtree + fwdflat to be faster than fwdtree-only search? What am I
missing?
By the way what 'fwd' prefix stands for - 'forward'?
The description of 'bestpath' option says that it is used for "Dijkstra search
over word lattice(3rd pass)". How this fits in the overall picture?
Thank you!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
These descriptions say that "fwdtree" searches using lextree and this is the
first pass and "fwdflat" then searches through a word lattice (apparently
constructed by fwdtree) and this is the second pass.
Not exactly like that. Fwdtree is lextree search (where all the words are
compressed into the tree, lm probs applied on the word end). Fwdflat is the
search using vocabulary generated by fwdtree with flat structure (no
compression of words, lm prob applied on start). Then there is bestpath search
on the lattice constructed by fwdflat. There was an intention to replace
fwdflat with lattice-based search but it was never implemented. The details of
the search algorithms are described in various papers, for example Ph.D thesis
of Ravi "Efficient algorithms for speech recognition".
So my question is: given the time needed for fwdflat is non-negative how it
is possible fwdtree + fwdflat to be faster than fwdtree-only search? What am I
missing?
If we fix the speed say 1xRT and try to reach best accuracy then accuracy from
0.8RT Lextree + 0.2 Flat should be better than just 1.0 lextree.
By the way what 'fwd' prefix stands for - 'forward'?
Yes
The description of 'bestpath' option says that it is used for "Dijkstra
search over word lattice(3rd pass)". How this fits in the overall picture?
Lattice is constructed by fwdflat and bestpath searches it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Not exactly like that. Fwdtree is lextree search (where all the words are
compressed into the tree, lm probs applied on the word end). Fwdflat is the
search using vocabulary generated by fwdtree with flat structure (no
compression of words, lm prob applied on start). Then there is bestpath search
on the lattice constructed by fwdflat. There was an intention to replace
fwdflat with lattice-based search but it was never implemented. The details of
the search algorithms are described in various papers, for example Ph.D thesis
of Ravi "Efficient algorithms for speech recognition".
Thank you very much for the reference!
I will definitely read at least the parts of the thesis that are of interest
for me. Can you please point out whether there are point where PS
significantly diverges from the design outlined in the paper?
In "Spoken Language Processing" the authors talk about something called
factored LM lextrees (I don't recall the exact terminology) where the most
optimistic of the children nodes' LM probabilities are attached to the subtree
roots. The intention was to weed out the most unpromising hypothesis as early
as possible, while not sacrificing accuracy. Is there something like this in
PS or LM probs are always attached to the leafs?
If we fix the speed say 1xRT and try to reach best accuracy then accuracy
from 0.8RT Lextree + 0.2 Flat should be better than just 1.0 lextree.
Got it. Figure 4.15 from the paper seems to confirm this too :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Can you please point out whether there are point where PS significantly
diverges from the design outlined in the paper?
Pocketsphinx has algorithmic optimizations to reduce memory footprint and
computation. For example, only CI phones are used for single-phone words. The
effort to port sphinx3 search to pocketsphinx was started (see lextree.c in
trunk for example), but it's not finished yet. It's rather simplistic actually
but unfortunately I'm not competent enough to talk about details :)
It's better to wait for David's PhD thesis to read everything :)
In "Spoken Language Processing" the authors talk about something called
factored LM lextrees (I don't recall the exact terminology) where the most
optimistic of the children nodes' LM probabilities are attached to the subtree
roots. The intention was to weed out the most unpromising hypothesis as early
as possible, while not sacrificing accuracy. Is there something like this in
PS or LM probs are always attached to the leafs?
PS doesn't use such thing as far as I understand. In sphinx3 and sphinx4 it's
used. In sphinx4 it's called "smear", it's actually quite important for
accuracy.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
in "pocketsphinx performance problem on CD model" thread (https://sourceforge
.net/projects/cmusphinx/forums/forum/5471/topic/4053314) Nickolay said that
enabling "fwdflat" decoding after "fwdtree" helps with both accuracy and
performance. I didn't find documentation on these two decoding modes except
for the description of the command line options in PS executables and man
pages. These descriptions say that "fwdtree" searches using lextree and this
is the first pass and "fwdflat" then searches through a word lattice
(apparently constructed by fwdtree) and this is the second pass.
So my question is: given the time needed for fwdflat is non-negative how it is
possible fwdtree + fwdflat to be faster than fwdtree-only search? What am I
missing?
By the way what 'fwd' prefix stands for - 'forward'?
The description of 'bestpath' option says that it is used for "Dijkstra search
over word lattice(3rd pass)". How this fits in the overall picture?
Thank you!
Not exactly like that. Fwdtree is lextree search (where all the words are
compressed into the tree, lm probs applied on the word end). Fwdflat is the
search using vocabulary generated by fwdtree with flat structure (no
compression of words, lm prob applied on start). Then there is bestpath search
on the lattice constructed by fwdflat. There was an intention to replace
fwdflat with lattice-based search but it was never implemented. The details of
the search algorithms are described in various papers, for example Ph.D thesis
of Ravi "Efficient algorithms for speech recognition".
If we fix the speed say 1xRT and try to reach best accuracy then accuracy from
0.8RT Lextree + 0.2 Flat should be better than just 1.0 lextree.
Yes
Lattice is constructed by fwdflat and bestpath searches it.
Thank you very much for the reference!
I will definitely read at least the parts of the thesis that are of interest
for me. Can you please point out whether there are point where PS
significantly diverges from the design outlined in the paper?
In "Spoken Language Processing" the authors talk about something called
factored LM lextrees (I don't recall the exact terminology) where the most
optimistic of the children nodes' LM probabilities are attached to the subtree
roots. The intention was to weed out the most unpromising hypothesis as early
as possible, while not sacrificing accuracy. Is there something like this in
PS or LM probs are always attached to the leafs?
Got it. Figure 4.15 from the paper seems to confirm this too :)
Pocketsphinx has algorithmic optimizations to reduce memory footprint and
computation. For example, only CI phones are used for single-phone words. The
effort to port sphinx3 search to pocketsphinx was started (see lextree.c in
trunk for example), but it's not finished yet. It's rather simplistic actually
but unfortunately I'm not competent enough to talk about details :)
It's better to wait for David's PhD thesis to read everything :)
PS doesn't use such thing as far as I understand. In sphinx3 and sphinx4 it's
used. In sphinx4 it's called "smear", it's actually quite important for
accuracy.