CMU Sphinx / Forums / Help: Pocketshpinx general questions

Tikiboy - 2012-01-31

Hey guys, I recently embarked on a journey to get pocketsphinx up and running
and i finally manage to get an example working on Android.
http://cmusphinx.sourceforge.net/2011/05/building-pocketsphinx-on-
android/

However, after inspecting the whole process, i am confused with some things
and i have some queries.

My first question is that in terms of the whole framework and how sphinx
works, is pocketsphinx the same as Sphinx4? I am asking this question because
i have tried to find documentations talking about the general concept/big
picture of how pocketsphinx works and all i found was API documentations. On
the other hand, Sphinx4 has documentations on the whole CMUSphinx concept.
There was a really good diagram in the Sphinx4 diagram as well as a powerpoint
http://cmusphinx.sourceforge.net/doc/speech.ppt describing these concepts. I am wondering if these
concepts (especially the diagram on the configuration-decoder-linguist
relationship) apply to pocketsphinx as well?

My second question is from the android example above, after compiling, i
realized that the method calls are all in Java. However, pocketsphinx and
sphinxbase are in C/C++. I understand that SWIG was used to link the 2
together and a java wrapper class has been made. However, the problem is that
I am unclear of how does a developer know which java wrapper method made by
SWIG correspond to which C/C++ method of pocketsphinx. I tried to make some
comparism and it baffles me how the java wrapper has a boolean arguement that
is mapped to an int arguement in C/C++.

Lastly, i was just wondering if there is another documentation for
pocketsphinx lying around aside from http://cmusphinx.sourceforge.net/api/poc
ketsphinx/ . I am asking
this because given a bunch of API, i have no idea how to use pocketsphinx
properly. There is no guideline on the basic initialization steps to take, for
example i should initialize configurationmanager before calling a recognizer
and subsequently get the result. So far, my only cue is to learn from the
example i followed and copy every line of code it used to do my audio
recognition.

Sorry for the long post. I hope it is not confusing.

Thanks alot!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2012-01-31

My first question is that in terms of the whole framework and how sphinx
works, is pocketsphinx the same as Sphinx4? I am asking this question because
i have tried to find documentations talking about the general concept/big
picture of how pocketsphinx works and all i found was API documentations. On
the other hand, Sphinx4 has documentations on the whole CMUSphinx concept.
There was a really good diagram in the Sphinx4 diagram as well as a powerpoint
http://cmusphinx.sourceforge.net/doc/speech.ppt describing these concepts. I am wondering if these
concepts (especially the diagram on the configuration-decoder-linguist
relationship) apply to pocketsphinx as well?

Pocketsphinx works same way from the high level point of view. However,
implementation details might be different. For example there is no separation
between SearchManager/Linguist in pocketsphinx, instead a single search object
performs the search. Acoustic model is the same and there are multiple
scorers.

However, the problem is that I am unclear of how does a developer know which
java wrapper method made by SWIG correspond to which C/C++ method of
pocketsphinx. I tried to make some comparism and it baffles me how the java
wrapper has a boolean arguement that is mapped to an int arguement in C/C++.

Yes, sometimes mapping is not straightforward, but in best case there must be
a separate document on the Java API made with SWIG. Currently we unfortunately
miss this part of documentation so you need to guess the functions. Usually
it's easy to guess. For example boolean is naturally mapped to int in most of
the programs in C. There is no separate boolean type, so everyone does that.
You can find some more details in SWIG documentation.

Lastly, i was just wondering if there is another documentation for
pocketsphinx lying around aside from http://cmusphinx.sourceforge.net/api/poc
ketsphinx/ . I am asking
this because given a bunch of API, i have no idea how to use pocketsphinx
properly. There is no guideline on the basic initialization steps to take, for
example i should initialize configurationmanager before calling a recognizer
and subsequently get the result. So far, my only cue is to learn from the
example i followed and copy every line of code it used to do my audio
recognition.

There is a tutorial which you must read before you start a development

http://cmusphinx.sourceforge.net/wiki/tutorial

It has part on the pocketsphinx

http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tikiboy - 2012-02-01

Thanks once again for the clarification. It tied up loose hanging ends that
instill doubt in the learning process (never a good thing in my case).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pocketshpinx general questions

Speech Recognition Toolkit

Forums

Help

Pocketshpinx general questions document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Pocketshpinx general questions