CMU Sphinx / Forums / Help: adding processing techniques to sphinxbase

luciano - 2011-05-06

Hello,
I'd like to include some pre/post processing techniques in a pocketsphinx
application. What I am doing now is using external programs to apply the
techniques before training/decoding.
Where would it be the proper place to include them within the sphinxbase
source code?
Here are some examples I'd like to use:

Spectral substraction from: Yang Lu, Philipos C. Loizou. A geometric approach
to spectral subtraction. Speech Communication 50 (2008) 453-466. 2008.http://
www.ncbi.nlm.nih.gov/pmc/articles/PMC2516309/

should it be somewhere inside cont_ad_base.c ?

Spectral normalization with Histogram equalization: Ángel de la Torre, et. al.
Histogram Equalization of Speech Representation for Robust Speech Recognition.
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 3, MAY 2005 ht
tp://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1420370&isnumber=30690

cmn.c?

Filtering in the cepstral domain: Chia-Ping Chen and Jeff A. Bilmes. MVA
Processing of Speech Features. IEEE TRANSACTIONS ON AUDIO, SPEECH, AND
LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 http://ieeexplore.ieee.org/
stamp/stamp.jsp?tp=&arnumber=4032763&isnumber=4032760

this tecnique uses several frames. feat.c?

In order to include this new options, should I modify any of the _fe_parse_XX
_ functions in fe_interface.c or should I do something else?

Thank you very much in advance
Luciano

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-05-07

Hello Luciano

I'd like to include some pre/post processing techniques in a pocketsphinx
application

That should be a great advancements. Mostly your location is ok, but I suggest
you to create a new files for some of the methods you proposed. Just like
cmn.c is a separate file, spectral substraction could be a separate one.

Honestly I think that if we will approach this seriously we need to redesign
the whole sphinxbase frontend. It should be way more flexible in terms of
which processing stages to include and how to perform them. For example, some
of the approaches will require us to apply VAD not in the first stage in
cont_ad but also in the later stages after FFT or even after cepstrum is
computed. We will also need a feedback loop.

I suggest you to do a few design sessions first to work it out. We need to
review other frontends of different toolkits for that. RWTH-ASR and Julius. We
need to create an advanced state of the art fronend with pluggable design and
efficient one.

In order to include this new options, should I modify any of the fe_parse_XX
functions in fe_interface.c or should I do something else?

No issue to create a new function in the same file

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

luciano - 2011-05-09

Hello Nickolay, thank you very much for your reply.
I will take your advices and I'll think it over on how to better implement and
include the algorithms.
I'll keep in touch on this regard.
Thanks again,
Luciano

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

adding processing techniques to sphinxbase

Speech Recognition Toolkit

Forums

Help

adding processing techniques to sphinxbase document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

adding processing techniques to sphinxbase