CMU Sphinx / Forums / Help: Pocketsphinx feature extraction

Yes it does, the sphinx_fe script takes the wav input file ( speech signal) and convert it to acoustic features or feature vector, take a look to man sphinx_fe which produce as a result :
NAME
sphinx_fe - Convert audio files to acoustic feature files

SYNOPSIS
sphinx_fe [ options ]...

DESCRIPTION
This program converts audio files (in either Microsoft WAV, NIST Sphere, or raw format) to acoustic feature files for input
to batch-mode speech recognition. The resulting files are also useful for various other things. A list of options follows:

   -alpha Preemphasis parameter

   -argfile
          file (e.g. feat.params from an acoustic model) to read parameters from.  This will override  anything  set  in  other
          command line arguments.

   -blocksize
          Number of samples to read at a time.

   -build_outdirs
          Create missing subdirectories in output directory

   -c     file for batch processing

   -cep2spec
          Input is cepstral files, output is log spectral files

   -di    directory, input file names are relative to this, if defined

   -dither
          Add 1/2-bit noise

   -do    directory, output files are relative to this

   -doublebw
          Use double bandwidth filters (same center freq)

   -ei    extension to be applied to all input files

   -eo    extension to be applied to all output files

   -example
          Shows example of how to use the tool

   -frate Frame rate

   -help  Shows the usage of the tool

   -i     audio input file

   -input_endian
          Endianness of input data, big or little, ignored if NIST or MS Wav

   -lifter
          Length of sin-curve for liftering, or 0 for no liftering.

   -logspec
          Write out logspectral files instead of cepstra

   -lowerf
          Lower edge of filters

   -mach_endian
          Endianness of machine, big or little

   -mswav Defines input format as Microsoft Wav (RIFF)

   -ncep  Number of cep coefficients

   -nchans
          Number of channels of data (interlaced samples assumed)

   -nfft  Size of FFT

   -nfilt Number of filter banks

   -nist  Defines input format as NIST sphere

   -npart Number of parts to run in (supersedes -nskip and -runlen if non-zero)

   -nskip If a control file was specified, the number of utterances to skip at the head of the file

   -o     cepstral output file

   -ofmt  Format of output files - one of sphinx, htk, text.

   -part  Index of the part to run (supersedes -nskip and -runlen if non-zero)

   -raw   Defines input format as raw binary data

   -remove_dc
          Remove DC offset from each frame

   -remove_noise
          Remove noise with spectral subtraction in mel-energies

   -remove_silence
          Enables VAD, removes silence frames from processing

   -round_filters
          Round mel filter frequencies to DFT points

   -runlen
          If a control file was specified, the number of utterances to process, or -1 for all

   -samprate
          Sampling rate

   -seed  Seed for random number generator; if less than zero, pick our own

   -smoothspec
          Write out cepstral-smoothed logspectral files

   -spec2cep
          Input is log spectral files, output is cepstral files

   -sph2pipe
          Input is NIST sphere (possibly with Shorten), use sph2pipe to convert

   -transform
          Which type of transform to use to calculate cepstra (legacy, dct, or htk)

   -unit_area
          Normalize mel filters to unit area

   -upperf
          Upper edge of filters

   -vad_postspeech
          Num of silence frames to keep after from speech to silence.

   -vad_prespeech
          Num of speech frames to keep before silence to speech.

   -vad_startspeech
          Num of speech frames to trigger vad from silence to speech.

   -vad_threshold
          Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level.

   -verbose
          Show input filenames

   -warp_params
          defining the warping function

   -warp_type
          Warping function type (or shape)

   -whichchan
          Channel to process (numbered from 1), or 0 to mix all channels

   -wlen  Hamming window length

   Currently  the  only kind of features supported are MFCCs (mel-frequency cepstral coefficients).  There are numerous options
   which control the properties of the output features.  It is VERY important that you document the specific set of flags  used
   to  create  any given set of feature files, since this information is NOT recorded in the files themselves, and any mismatch
   between the parameters used to extract features for recognition and those used to extract features for training  will  cause
   recognition to fail.

Pocketsphinx feature extraction

Speech Recognition Toolkit

Forums

Help

Pocketsphinx feature extraction document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Pocketsphinx feature extraction