CMU Sphinx / Forums / Help: not sure what i'm doing

Anonymous - 2010-01-21

maybe i'm wrong about sphinx, but I want to be able to have something be voice
independent and understand certain keywords people may say, and then based on
that run a linux command.

I would be doing this on a server not a desktop, so I need something with
command line capability, and maybe php web.

I played around with pocketsphinx last night, but couldnt get anything
exciting to happen...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-01-21

not sure what i'm doing

"I'm all such a mysterious person such contradictory one". Come on, if you
don't know what to do who else will help you :)

I played around with pocketsphinx last night, but couldnt get anything
exciting to happen...

Well, probably you did some strange things. It's just a simple engineering
task. you have a complex task, you need to split in on parts until solution
will be clear and solve each one like:

Transfer speech from client to server (Flash could be used here or Java applet or upload)

Recognize speech (You can create your own pocketsphinx solution or better use web service like http://www.speechapi.com that's easier to start with)

Do commands (on http request you can do everything you need with PHP)

Now once you'll have such decomposition try to approach at least one point
above. Try to solve it or decompose if needed or just ask what you don't
understand. But please be more precise, it's funny to read questions sometimes
but not so funny to answer on them.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Here is what I got:

DiskStation> pocketsphinx_continuous -adcdev /dev/dsp -hmm /usr/local/share/pocketsphinx/model/hmm/wsj1 -dict /opt/mh/data/pocketsphinx/current.dic -lm /opt/mh/data/pocketsphinx/current.lm 
INFO: cmd_ln.c(506): Parsing command line:
pocketsphinx_continuous \
    -adcdev /dev/dsp \
    -hmm /usr/local/share/pocketsphinx/model/hmm/wsj1 \
    -dict /opt/mh/data/pocketsphinx/current.dic \
    -lm /opt/mh/data/pocketsphinx/current.lm

Current configuration:
[NAME]      [DEFLT]     [VALUE]
-adcdev             /dev/dsp
-agc        none        none
-agcthresh  2.0     2.000000e+00
-alpha      0.97        9.700000e-01
-argfile            
-ascale     20.0        2.000000e+01
-backtrace  no      no
-beam       1e-48       1.000000e-48
-bestpath   yes     yes
-bestpathlw 9.5     9.500000e+00
-cep2spec   no      no
-ceplen     13      13
-cmn        current     current
-cmninit    8.0     8.0
-compallsen no      no
-dict               /opt/mh/data/pocketsphinx/current.dic
-dictcase   no      no
-dither     no      no
-doublebw   no      no
-ds     1       1
-fdict              
-feat       1s_c_d_dd   1s_c_d_dd
-featparams         
-fillprob   1e-8        1.000000e-08
-frate      100     100
-fsg                
-fsgusealtpron  yes     yes
-fsgusefiller   yes     yes
-fwdflat    yes     yes
-fwdflatbeam    1e-64       1.000000e-64
-fwdflatefwid   4       4
-fwdflatlw  8.5     8.500000e+00
-fwdflatsfwin   25      25
-fwdflatwbeam   7e-29       7.000000e-29
-fwdtree    yes     yes
-hmm                /usr/local/share/pocketsphinx/model/hmm/wsj1
-input_endian   big     big
-jsgf               
-kdmaxbbi   -1      -1
-kdmaxdepth 0       0
-kdtree             
-latsize    5000        5000
-lda                
-ldadim     0       0
-lifter     0       0
-lm             /opt/mh/data/pocketsphinx/current.lm
-lmctl              
-lmname     default     default
-logbase    1.0001      1.000100e+00
-logfn              
-logspec    no      no
-lowerf     133.33334   1.333333e+02
-lpbeam     1e-40       1.000000e-40
-lponlybeam 7e-29       7.000000e-29
-lw     6.5     6.500000e+00
-maxhistpf  100     100
-maxhmmpf   -1      -1
-maxnewoov  20      20
-maxwpf     -1      -1
-mdef               
-mean               
-mfclogdir          
-mixw               
-mixwfloor  0.0000001   1.000000e-07
-mmap       yes     yes
-ncep       13      13
-nfft       512     512
-nfilt      40      40
-nwpen      1.0     1.000000e+00
-pbeam      1e-48       1.000000e-48
-pip        1.0     1.000000e+00
-rawlogdir          
-remove_dc  no      no
-round_filters  yes     yes
-samprate   16000       1.600000e+04
-sdmap              
-seed       -1      -1
-sendump            
-silprob    0.005       5.000000e-03
-smoothspec no      no
-spec2cep   no      no
-svspec             
-tmat               
-tmatfloor  0.0001      1.000000e-04
-topn       4       4
-toprule            
-transform  legacy      legacy
-unit_area  yes     yes
-upperf     6855.4976   6.855498e+03
-usewdphones    no      no
-uw     1.0     1.000000e+00
-var                
-varfloor   0.0001      1.000000e-04
-varnorm    no      no
-verbose    no      no
-warp_params            
-warp_type  inverse_linear  inverse_linear
-wbeam      7e-29       7.000000e-29
-wip        0.65        6.500000e-01
-wlen       0.025625    2.562500e-02

INFO: cmd_ln.c(506): Parsing command line:
\
    -lowerf 1 \
    -upperf 4000 \
    -nfilt 20 \
    -transform dct \
    -round_filters no \
    -remove_dc yes \
    -feat s2_4x

Current configuration:
[NAME]      [DEFLT]     [VALUE]
-agc        none        none
-agcthresh  2.0     2.000000e+00
-alpha      0.97        9.700000e-01
-cep2spec   no      no
-ceplen     13      13
-cmn        current     current
-cmninit    8.0     8.0
-dither     no      no
-doublebw   no      no
-feat       1s_c_d_dd   s2_4x
-frate      100     100
-input_endian   big     big
-lda                
-ldadim     0       0
-lifter     0       0
-logfn              
-logspec    no      no
-lowerf     133.33334   1.000000e+00
-mfclogdir          
-ncep       13      13
-nfft       512     512
-nfilt      40      20
-rawlogdir          
-remove_dc  no      yes
-round_filters  yes     no
-samprate   16000       1.600000e+04
-seed       -1      -1
-smoothspec no      no
-spec2cep   no      no
-svspec             
-transform  legacy      dct
-unit_area  yes     yes
-upperf     6855.4976   4.000000e+03
-varnorm    no      no
-verbose    no      no
-warp_params            
-warp_type  inverse_linear  inverse_linear
-wlen       0.025625    2.562500e-02

INFO: acmod.c(82): Parsed model-specific feature parameters from /usr/local/share/pocketsphinx/model/hmm/wsj1/feat.params
INFO: mdef.c(520): Reading model definition: /usr/local/share/pocketsphinx/model/hmm/wsj1/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(301): Reading binary model definition: /usr/local/share/pocketsphinx/model/hmm/wsj1/mdef
INFO: bin_mdef.c(311): Must byte-swap /usr/local/share/pocketsphinx/model/hmm/wsj1/mdef
WARNING: "bin_mdef.c", line 371: -mmap specified, but mdef is other-endian.  Will not memory-map.
INFO: bin_mdef.c(480): 44 CI-phone, 66516 CD-phone, 5 emitstate/phone, 220 CI-sen, 5220 Sen, 18660 Sen-Seq
INFO: tmat.c(204): Reading HMM transition probability matrices: /usr/local/share/pocketsphinx/model/hmm/wsj1/transition_matrices
INFO: acmod.c(114): Attempting to use SCGMM computation module
INFO: s2_semi_mgau.c(981): Reading S3 mixture gaussian file '/usr/local/share/pocketsphinx/model/hmm/wsj1/means'
INFO: s2_semi_mgau.c(1080): 1 mixture Gaussians, 256 components, 4 feature streams, veclen 51
INFO: s2_semi_mgau.c(981): Reading S3 mixture gaussian file '/usr/local/share/pocketsphinx/model/hmm/wsj1/variances'
INFO: s2_semi_mgau.c(1080): 1 mixture Gaussians, 256 components, 4 feature streams, veclen 51
INFO: s2_semi_mgau.c(748): Loading senones from dump file /usr/local/share/pocketsphinx/model/hmm/wsj1/sendump
INFO: s2_semi_mgau.c(764): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(793): Rows: 256, Columns: 5220
INFO: s2_semi_mgau.c(801): Using memory-mapped I/O for senones
INFO: kdtree.c(231): Reading tree for feature 0
INFO: kdtree.c(249): n_density 256 n_comp 12 n_level 8 threshold 0.200000
INFO: kdtree.c(186): Read 255 nodes
INFO: kdtree.c(231): Reading tree for feature 1
INFO: kdtree.c(249): n_density 256 n_comp 24 n_level 8 threshold 0.200000
INFO: kdtree.c(186): Read 255 nodes
INFO: kdtree.c(231): Reading tree for feature 2
INFO: kdtree.c(249): n_density 256 n_comp 3 n_level 8 threshold 0.200000
INFO: kdtree.c(186): Read 255 nodes
INFO: kdtree.c(231): Reading tree for feature 3
INFO: kdtree.c(249): n_density 256 n_comp 12 n_level 8 threshold 0.200000
INFO: kdtree.c(186): Read 255 nodes
INFO: feat.c(849): Initializing feature stream to type: 's2_4x', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: dict.c(232): Allocating 20 placeholders for new OOVs
INFO: dict.c(494):    143 = words in file [/opt/mh/data/pocketsphinx/current.dic]
WARNING: "dict.c", line 435: Skipping duplicate definition of <s>
WARNING: "dict.c", line 435: Skipping duplicate definition of </s>
WARNING: "dict.c", line 435: Skipping duplicate definition of <sil>
INFO: dict.c(494):      3 = words in file [/usr/local/share/pocketsphinx/model/hmm/wsj1/noisedict]
INFO: dict.c(349): LEFT CONTEXT TABLES
INFO: dict.c(1013): Entry Context table contains
       107 entries
INFO: dict.c(1014):       4708 possible cross word triphones.
INFO: dict.c(1052):       4240 triphones
       424 pseudo diphones
        44 uniphones
INFO: dict.c(1099): Exit Context table contains
       107 entries
INFO: dict.c(1100):       4708 possible cross word triphones.
INFO: dict.c(1166):       4240 triphones
       424 pseudo diphones
        44 uniphones
INFO: dict.c(1168):       1949 right context entries
INFO: dict.c(1169):         18 ave entries per exit context
INFO: dict.c(355): RIGHT CONTEXT TABLES
INFO: dict.c(1013): Entry Context table contains
        93 entries
INFO: dict.c(1014):       4092 possible cross word triphones.
INFO: dict.c(1052):       3822 triphones
       182 pseudo diphones
        88 uniphones
INFO: dict.c(1099): Exit Context table contains
        93 entries
INFO: dict.c(1100):       4092 possible cross word triphones.
INFO: dict.c(1166):       3822 triphones
       182 pseudo diphones
        88 uniphones
INFO: dict.c(1168):       2082 right context entries
INFO: dict.c(1169):         22 ave entries per exit context
INFO: ngram_model_arpa.c(539): ngrams 1=135, 2=260, 3=262
INFO: ngram_model_arpa.c(204): Reading unigrams
INFO: ngram_model_arpa.c(578):      135 = #unigrams created
INFO: ngram_model_arpa.c(260): Reading bigrams
INFO: ngram_model_arpa.c(594):      260 = #bigrams created
INFO: ngram_model_arpa.c(595):       32 = #prob2 entries
INFO: ngram_model_arpa.c(602):       23 = #bo_wt2 entries
INFO: ngram_model_arpa.c(358): Reading trigrams
INFO: ngram_model_arpa.c(615):      262 = #trigrams created
INFO: ngram_model_arpa.c(616):       17 = #prob3 entries
INFO: ngram_search_fwdtree.c(156): 0 root, 0 non-root channels, 29 single-phone words
INFO: ngram_search_fwdtree.c(195): Creating search tree
INFO: ngram_search_fwdtree.c(203): 0 root, 0 non-root channels, 29 single-phone words
INFO: ngram_search_fwdtree.c(325): max nonroot chan increased to 408
INFO: ngram_search_fwdtree.c(334): 103 root, 280 non-root channels, 9 single-phone words
INFO: ngram_search_fwdflat.c(95): fwdflat: min_ef_width = 4, max_sf_win = 25
ad_oss.c 258: can't set input gain/recording level for this device.
INFO: continuous.c(261): pocketsphinx_continuous COMPILED ON: Jan 20 2010, AT: 18:37:36

FATAL_ERROR: "continuous.c", line 135: cont_ad_calib failed

Nickolay V. Shmyrev - 2010-01-29

It tells you it can't record from your device. It might be input muted or
permission problem for example.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2010-01-29

/dev/dsp was:
crw-r--r-- 1 root root 14, 3 Mar 20 2007 /dev/dsp
and I chmod to 664, and change group to the users group (not that it should
matter):
crw-rw-r-- 1 root users 14, 3 Mar 20 2007 /dev/dsp

Still the same error.

Here is the USB headset/mic in /proc/bus/usb/devices:

T: Bus=02 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
P: Vendor=046d ProdID=0a0c Rev=10.13
S: Manufacturer=Logitech
S: Product=Logitech USB Headset
S: SerialNumber=c0a5d6b2d3b6fede9ccf9abad2b1d4a0
C: #Ifs= 4 Cfg#= 1 Atr=80 MxPwr=100mA
I: If#= 0 Alt= 0 #EPs= 0 Cls=01(audio) Sub=01 Prot=00 Driver=snd-usb-audio
I: If#= 1 Alt= 0 #EPs= 0 Cls=01(audio) Sub=02 Prot=00 Driver=snd-usb-audio
I: If#= 1 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=00 Driver=snd-usb-audio
E: Ad=01(O) Atr=0d(Isoc) MxPS= 192 Ivl=1ms
I: If#= 1 Alt= 2 #EPs= 1 Cls=01(audio) Sub=02 Prot=00 Driver=snd-usb-audio
E: Ad=01(O) Atr=0d(Isoc) MxPS= 96 Ivl=1ms
I: If#= 2 Alt= 0 #EPs= 0 Cls=01(audio) Sub=02 Prot=00 Driver=snd-usb-audio
I: If#= 2 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=00 Driver=snd-usb-audio
E: Ad=84(I) Atr=0d(Isoc) MxPS= 96 Ivl=1ms
I:* If#= 3 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=00 Prot=00 Driver=usbhid
E: Ad=83(I) Atr=03(Int.) MxPS= 2 Ivl=8ms

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-01-29

Headset USB microphone is most likely something like /dev/dsp1. Try to record
audio with any recording application like audacity, you'll easily find the
correct device.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2010-01-30

nope thats not it, my distro only comes with /dev/dsp. Even when i add
/dev/dsp1 manually. The system only uses that if I have two USB audio devices
plugged in ( I have no PCI audio devices). I've tried cvoicecontrol and its
buggy, but was able to pick up the mic input on my headset on /dev/dsp. I am
unable to install audacity because I don't have a monitor on this server, and
don't have ALSA.

I was able to get it to work with a plain USB mic, but not with the headset:

/usr/local/bin/pocketsphinx_continuous -lm ./../data/pocketsphinx/current.lm
-dict ./../data/pocketsphinx/current.dic -hmm
/usr/local/share/pocketsphinx/model/hmm/wsj1 -samprate 8000 -adcdev /dev/dsp

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-01-31

You probalby shouldn't bother much with OSS bugs, unix audio still sucks after
years. Try to record in a file and decode with pocketsphinx_batch. I think it
will be enough for experiments.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sandeep Verma - 2012-03-31

I want to develop an application using "cmusphinx" for convert waves(mp3,video
audio ) to text and vice versa using PHP

any kind of help is appreciated.

Thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2012-04-07

I want to develop an application using "cmusphinx" for convert
waves(mp3,video audio ) to text and vice versa using PHP

You can invoke pocketsphinx_continuous binary from php and store text results.
It's relatively simple to implement.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

not sure what i'm doing

Speech Recognition Toolkit

Forums

Help

not sure what i'm doing document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

not sure what i'm doing