Hello, I've successfully executed pocketsphinx_tidigits
under Nanodesktop.
Now, I must create my own models for recognition of other
words. I have some questions:
a) SphinxTrain is usable also for PocketSphinx ?
b) There is an official how-to that explains how to use
SphinxTrain and the other utilities step by step in way to
create my models ?
c) Where is SphinxTrain source code ?
Thanks in advance.
(If you retain that it is useful, I have some advices to make
easier the porting of your software on other platforms; for
example, in our Nanodesktop porting we have had not few
troubles...)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
a) I have seen that you use double for
computations. Some platforms use 32-bit
depth for double (as for float) for
performance reasons. So, if the source code
is recompiled to work for this platform,
variable as -wbeam and similar are truncated
by the compiler to 0 value and this error
causes a secondary error in the value returned
by logmath and finally the error of the
decoder.
When we have seen in a previous video
that there are 0 words recognized, the main
trouble is that wbeam parameter was wrong.
Under nd, I've modified your code, redirecting
all calls to mathematical functions, as to
printf and scanf, to dedicated routines,
compiled to work at 64-bit depth, and
separated by normal mathematical routines
included in NanoM library.
b) Some escape chars sequences for scanf aren't
recognized by some versions of libc scanf.
For example, Avr-Libc scanf doesn't recognize
your scanf calls. It is adviceable to include
a dedicated version of scanf in your source
code (for example, I've added a copy of Minix
scanf called psphinx_scanf, and I've modified
your code to use it instead of normal
NanoC scanf). (In any case, perhaps this problem
can be considered a bug of Avr-Libc scanf
and not a bug of Sphinx: I've replaced this
routine also in NanoC library, so nd will
have a new implementation of scanf in its
next release).
c) PocketSphinx_digits uses a microphone with
a frequency of 8000 Hz. Some platforms haven't
a microphone driver that can acquire at this
frequency. I've created a software layer that
executes an undersampling operation from
44100 Hz (real frequency of acquisition for
Nanodesktop ndHAL_Mic API) to 8000 Hz,
before passing the data to the decoder. It is
adviceable that the decoder includes this
layer for undersampling internally and that
it is automatically executed.
I hope that there are no problems with
SphinxTrain. I'll let you know what it
happens.
Thanks for your collaboration
Filippo Battaglia
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello, I've successfully executed pocketsphinx_tidigits
under Nanodesktop.
Now, I must create my own models for recognition of other
words. I have some questions:
a) SphinxTrain is usable also for PocketSphinx ?
b) There is an official how-to that explains how to use
SphinxTrain and the other utilities step by step in way to
create my models ?
c) Where is SphinxTrain source code ?
Thanks in advance.
(If you retain that it is useful, I have some advices to make
easier the porting of your software on other platforms; for
example, in our Nanodesktop porting we have had not few
troubles...)
Yes, please do post about the issues you had in posting...
Here the video that shows your software
working under nd.
http://www.youtube.com/watch?v=Y0cqbzB6CV8
Some notes:
a) I have seen that you use double for
computations. Some platforms use 32-bit
depth for double (as for float) for
performance reasons. So, if the source code
is recompiled to work for this platform,
variable as -wbeam and similar are truncated
by the compiler to 0 value and this error
causes a secondary error in the value returned
by logmath and finally the error of the
decoder.
When we have seen in a previous video
that there are 0 words recognized, the main
trouble is that wbeam parameter was wrong.
Under nd, I've modified your code, redirecting
all calls to mathematical functions, as to
printf and scanf, to dedicated routines,
compiled to work at 64-bit depth, and
separated by normal mathematical routines
included in NanoM library.
b) Some escape chars sequences for scanf aren't
recognized by some versions of libc scanf.
For example, Avr-Libc scanf doesn't recognize
your scanf calls. It is adviceable to include
a dedicated version of scanf in your source
code (for example, I've added a copy of Minix
scanf called psphinx_scanf, and I've modified
your code to use it instead of normal
NanoC scanf). (In any case, perhaps this problem
can be considered a bug of Avr-Libc scanf
and not a bug of Sphinx: I've replaced this
routine also in NanoC library, so nd will
have a new implementation of scanf in its
next release).
c) PocketSphinx_digits uses a microphone with
a frequency of 8000 Hz. Some platforms haven't
a microphone driver that can acquire at this
frequency. I've created a software layer that
executes an undersampling operation from
44100 Hz (real frequency of acquisition for
Nanodesktop ndHAL_Mic API) to 8000 Hz,
before passing the data to the decoder. It is
adviceable that the decoder includes this
layer for undersampling internally and that
it is automatically executed.
I hope that there are no problems with
SphinxTrain. I'll let you know what it
happens.
Thanks for your collaboration
Filippo Battaglia