As part of our internship at a Dutch company, we will be trying to optimize
PocketSphinx for the Blackfin architecture. We made a few flow charts to get a
basic understanding of how PocketSphinx works internally. Any comments or
advise on these diagrams would be extremely useful and much appreciated. Our
company (TASS software professionals) has kindly agreed to release our
documentation into the public domain.
Nice diagrams, are they automatically generated from source/runtime execution
trace? It would be nice to put something like this into documentation indeed,
probably in more compressed form without minor insignificant branches like
error branches.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your suggestion. Removing error branches might be a good idea, as
it simplifies the diagrams a great deal. The diagrams were drawn in Dia (a
charting application which I wouldn't recommend because it's quite buggy).
Would you (or anyone else familiar with PocketSphinx) care to look through the
comments we placed on the right in the diagrams linked to in my previous post?
I'm especially interested in conceptual errors I might have made.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your feedback Nickolay. You are right in saying that these diagrams
capture the design of PocketSphinx at a lower level than is desirable.
The scheme you posted (augmented by explanation) would be very valuable for
people who are trying to get a grip on how ASR works in the case of
PocketSphinx. While I've got a basic understanding of what most of these steps
are, it would take some time for me to get familiar enough with PocketSphinx
to be able to draw them in a high level diagram. I would love to do this
(speech recognition is awesome!) but it would be too far out of the scope of
my project and internship.
Thanks again for your input. We will let something know on this forum when we
get some results from our project.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello PocketSphinx community,
As part of our internship at a Dutch company, we will be trying to optimize
PocketSphinx for the Blackfin architecture. We made a few flow charts to get a
basic understanding of how PocketSphinx works internally. Any comments or
advise on these diagrams would be extremely useful and much appreciated. Our
company (TASS software professionals) has kindly agreed to release our
documentation into the public domain.
Attached are flow charts of the ps_process_raw and ps_get_hyp functions and
their subfunctions.
http://img28.imageshack.us/img28/6595/flowchart1131pocketsphi.png
http://img295.imageshack.us/img295/4413/flowchart11311pocketsph.png
http://img21.imageshack.us/img21/7293/flowchart11312pocketsph.png
http://img21.imageshack.us/img21/9794/flowchart1141pocketsphi.png
http://img404.imageshack.us/img404/7835/flowchart11411pocketsph.png
Hello
Nice diagrams, are they automatically generated from source/runtime execution
trace? It would be nice to put something like this into documentation indeed,
probably in more compressed form without minor insignificant branches like
error branches.
Thanks for your suggestion. Removing error branches might be a good idea, as
it simplifies the diagrams a great deal. The diagrams were drawn in Dia (a
charting application which I wouldn't recommend because it's quite buggy).
Would you (or anyone else familiar with PocketSphinx) care to look through the
comments we placed on the right in the diagrams linked to in my previous post?
I'm especially interested in conceptual errors I might have made.
Well, its not a conceptual errors, but it would be nice to avoid several
things:
Mix important things (core functions) with not important (counter names, does it really matter what counter do you increment?)
Follow target domain. ASR in pocketsphinx is actually done by following scheme
Audio Input -> Endpointer Calibration
Audio INput -> Endpointing -> Frame Sequencing -> DFT -> MEL Fiters -> Ceptra
-> Normalization -> Feature Calculation -> Feature Set
Feature Set -> Tree Search -> Flat Search -> Bestpath Search
Tree Search = Init -> Find Best Tokens -> Grow -> Prune
Flat Search = Same, but using tree result from the first path
Bestpath Search = Lattice Viterbi Forward + Backward
Diagrams should have these concepts, not counters and strangely looking loops
Thanks for your feedback Nickolay. You are right in saying that these diagrams
capture the design of PocketSphinx at a lower level than is desirable.
The scheme you posted (augmented by explanation) would be very valuable for
people who are trying to get a grip on how ASR works in the case of
PocketSphinx. While I've got a basic understanding of what most of these steps
are, it would take some time for me to get familiar enough with PocketSphinx
to be able to draw them in a high level diagram. I would love to do this
(speech recognition is awesome!) but it would be too far out of the scope of
my project and internship.
Thanks again for your input. We will let something know on this forum when we
get some results from our project.