Menu

not sure what i'm doing

Help
Anonymous
2010-01-21
2012-09-22
  • Anonymous

    Anonymous - 2010-01-21

    maybe i'm wrong about sphinx, but I want to be able to have something be voice
    independent and understand certain keywords people may say, and then based on
    that run a linux command.

    I would be doing this on a server not a desktop, so I need something with
    command line capability, and maybe php web.

    I played around with pocketsphinx last night, but couldnt get anything
    exciting to happen...

     
  • Nickolay V. Shmyrev

    not sure what i'm doing

    "I'm all such a mysterious person such contradictory one". Come on, if you
    don't know what to do who else will help you :)

    I played around with pocketsphinx last night, but couldnt get anything
    exciting to happen...

    Well, probably you did some strange things. It's just a simple engineering
    task. you have a complex task, you need to split in on parts until solution
    will be clear and solve each one like:

    1. Transfer speech from client to server (Flash could be used here or Java applet or upload)
    2. Recognize speech (You can create your own pocketsphinx solution or better use web service like http://www.speechapi.com that's easier to start with)
    3. Do commands (on http request you can do everything you need with PHP)

    Now once you'll have such decomposition try to approach at least one point
    above. Try to solve it or decompose if needed or just ask what you don't
    understand. But please be more precise, it's funny to read questions sometimes
    but not so funny to answer on them.

     
  • Anonymous

    Anonymous - 2010-01-28

    Here is what I got:

    DiskStation> pocketsphinx_continuous -adcdev /dev/dsp -hmm /usr/local/share/pocketsphinx/model/hmm/wsj1 -dict /opt/mh/data/pocketsphinx/current.dic -lm /opt/mh/data/pocketsphinx/current.lm 
    INFO: cmd_ln.c(506): Parsing command line:
    pocketsphinx_continuous \
        -adcdev /dev/dsp \
        -hmm /usr/local/share/pocketsphinx/model/hmm/wsj1 \
        -dict /opt/mh/data/pocketsphinx/current.dic \
        -lm /opt/mh/data/pocketsphinx/current.lm
    
    Current configuration:
    [NAME]      [DEFLT]     [VALUE]
    -adcdev             /dev/dsp
    -agc        none        none
    -agcthresh  2.0     2.000000e+00
    -alpha      0.97        9.700000e-01
    -argfile            
    -ascale     20.0        2.000000e+01
    -backtrace  no      no
    -beam       1e-48       1.000000e-48
    -bestpath   yes     yes
    -bestpathlw 9.5     9.500000e+00
    -cep2spec   no      no
    -ceplen     13      13
    -cmn        current     current
    -cmninit    8.0     8.0
    -compallsen no      no
    -dict               /opt/mh/data/pocketsphinx/current.dic
    -dictcase   no      no
    -dither     no      no
    -doublebw   no      no
    -ds     1       1
    -fdict              
    -feat       1s_c_d_dd   1s_c_d_dd
    -featparams         
    -fillprob   1e-8        1.000000e-08
    -frate      100     100
    -fsg                
    -fsgusealtpron  yes     yes
    -fsgusefiller   yes     yes
    -fwdflat    yes     yes
    -fwdflatbeam    1e-64       1.000000e-64
    -fwdflatefwid   4       4
    -fwdflatlw  8.5     8.500000e+00
    -fwdflatsfwin   25      25
    -fwdflatwbeam   7e-29       7.000000e-29
    -fwdtree    yes     yes
    -hmm                /usr/local/share/pocketsphinx/model/hmm/wsj1
    -input_endian   big     big
    -jsgf               
    -kdmaxbbi   -1      -1
    -kdmaxdepth 0       0
    -kdtree             
    -latsize    5000        5000
    -lda                
    -ldadim     0       0
    -lifter     0       0
    -lm             /opt/mh/data/pocketsphinx/current.lm
    -lmctl              
    -lmname     default     default
    -logbase    1.0001      1.000100e+00
    -logfn              
    -logspec    no      no
    -lowerf     133.33334   1.333333e+02
    -lpbeam     1e-40       1.000000e-40
    -lponlybeam 7e-29       7.000000e-29
    -lw     6.5     6.500000e+00
    -maxhistpf  100     100
    -maxhmmpf   -1      -1
    -maxnewoov  20      20
    -maxwpf     -1      -1
    -mdef               
    -mean               
    -mfclogdir          
    -mixw               
    -mixwfloor  0.0000001   1.000000e-07
    -mmap       yes     yes
    -ncep       13      13
    -nfft       512     512
    -nfilt      40      40
    -nwpen      1.0     1.000000e+00
    -pbeam      1e-48       1.000000e-48
    -pip        1.0     1.000000e+00
    -rawlogdir          
    -remove_dc  no      no
    -round_filters  yes     yes
    -samprate   16000       1.600000e+04
    -sdmap              
    -seed       -1      -1
    -sendump            
    -silprob    0.005       5.000000e-03
    -smoothspec no      no
    -spec2cep   no      no
    -svspec             
    -tmat               
    -tmatfloor  0.0001      1.000000e-04
    -topn       4       4
    -toprule            
    -transform  legacy      legacy
    -unit_area  yes     yes
    -upperf     6855.4976   6.855498e+03
    -usewdphones    no      no
    -uw     1.0     1.000000e+00
    -var                
    -varfloor   0.0001      1.000000e-04
    -varnorm    no      no
    -verbose    no      no
    -warp_params            
    -warp_type  inverse_linear  inverse_linear
    -wbeam      7e-29       7.000000e-29
    -wip        0.65        6.500000e-01
    -wlen       0.025625    2.562500e-02
    
    INFO: cmd_ln.c(506): Parsing command line:
    \
        -lowerf 1 \
        -upperf 4000 \
        -nfilt 20 \
        -transform dct \
        -round_filters no \
        -remove_dc yes \
        -feat s2_4x
    
    Current configuration:
    [NAME]      [DEFLT]     [VALUE]
    -agc        none        none
    -agcthresh  2.0     2.000000e+00
    -alpha      0.97        9.700000e-01
    -cep2spec   no      no
    -ceplen     13      13
    -cmn        current     current
    -cmninit    8.0     8.0
    -dither     no      no
    -doublebw   no      no
    -feat       1s_c_d_dd   s2_4x
    -frate      100     100
    -input_endian   big     big
    -lda                
    -ldadim     0       0
    -lifter     0       0
    -logfn              
    -logspec    no      no
    -lowerf     133.33334   1.000000e+00
    -mfclogdir          
    -ncep       13      13
    -nfft       512     512
    -nfilt      40      20
    -rawlogdir          
    -remove_dc  no      yes
    -round_filters  yes     no
    -samprate   16000       1.600000e+04
    -seed       -1      -1
    -smoothspec no      no
    -spec2cep   no      no
    -svspec             
    -transform  legacy      dct
    -unit_area  yes     yes
    -upperf     6855.4976   4.000000e+03
    -varnorm    no      no
    -verbose    no      no
    -warp_params            
    -warp_type  inverse_linear  inverse_linear
    -wlen       0.025625    2.562500e-02
    
    INFO: acmod.c(82): Parsed model-specific feature parameters from /usr/local/share/pocketsphinx/model/hmm/wsj1/feat.params
    INFO: mdef.c(520): Reading model definition: /usr/local/share/pocketsphinx/model/hmm/wsj1/mdef
    INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
    INFO: bin_mdef.c(301): Reading binary model definition: /usr/local/share/pocketsphinx/model/hmm/wsj1/mdef
    INFO: bin_mdef.c(311): Must byte-swap /usr/local/share/pocketsphinx/model/hmm/wsj1/mdef
    WARNING: "bin_mdef.c", line 371: -mmap specified, but mdef is other-endian.  Will not memory-map.
    INFO: bin_mdef.c(480): 44 CI-phone, 66516 CD-phone, 5 emitstate/phone, 220 CI-sen, 5220 Sen, 18660 Sen-Seq
    INFO: tmat.c(204): Reading HMM transition probability matrices: /usr/local/share/pocketsphinx/model/hmm/wsj1/transition_matrices
    INFO: acmod.c(114): Attempting to use SCGMM computation module
    INFO: s2_semi_mgau.c(981): Reading S3 mixture gaussian file '/usr/local/share/pocketsphinx/model/hmm/wsj1/means'
    INFO: s2_semi_mgau.c(1080): 1 mixture Gaussians, 256 components, 4 feature streams, veclen 51
    INFO: s2_semi_mgau.c(981): Reading S3 mixture gaussian file '/usr/local/share/pocketsphinx/model/hmm/wsj1/variances'
    INFO: s2_semi_mgau.c(1080): 1 mixture Gaussians, 256 components, 4 feature streams, veclen 51
    INFO: s2_semi_mgau.c(748): Loading senones from dump file /usr/local/share/pocketsphinx/model/hmm/wsj1/sendump
    INFO: s2_semi_mgau.c(764): BEGIN FILE FORMAT DESCRIPTION
    INFO: s2_semi_mgau.c(793): Rows: 256, Columns: 5220
    INFO: s2_semi_mgau.c(801): Using memory-mapped I/O for senones
    INFO: kdtree.c(231): Reading tree for feature 0
    INFO: kdtree.c(249): n_density 256 n_comp 12 n_level 8 threshold 0.200000
    INFO: kdtree.c(186): Read 255 nodes
    INFO: kdtree.c(231): Reading tree for feature 1
    INFO: kdtree.c(249): n_density 256 n_comp 24 n_level 8 threshold 0.200000
    INFO: kdtree.c(186): Read 255 nodes
    INFO: kdtree.c(231): Reading tree for feature 2
    INFO: kdtree.c(249): n_density 256 n_comp 3 n_level 8 threshold 0.200000
    INFO: kdtree.c(186): Read 255 nodes
    INFO: kdtree.c(231): Reading tree for feature 3
    INFO: kdtree.c(249): n_density 256 n_comp 12 n_level 8 threshold 0.200000
    INFO: kdtree.c(186): Read 255 nodes
    INFO: feat.c(849): Initializing feature stream to type: 's2_4x', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: dict.c(232): Allocating 20 placeholders for new OOVs
    INFO: dict.c(494):    143 = words in file [/opt/mh/data/pocketsphinx/current.dic]
    WARNING: "dict.c", line 435: Skipping duplicate definition of <s>
    WARNING: "dict.c", line 435: Skipping duplicate definition of </s>
    WARNING: "dict.c", line 435: Skipping duplicate definition of <sil>
    INFO: dict.c(494):      3 = words in file [/usr/local/share/pocketsphinx/model/hmm/wsj1/noisedict]
    INFO: dict.c(349): LEFT CONTEXT TABLES
    INFO: dict.c(1013): Entry Context table contains
           107 entries
    INFO: dict.c(1014):       4708 possible cross word triphones.
    INFO: dict.c(1052):       4240 triphones
           424 pseudo diphones
            44 uniphones
    INFO: dict.c(1099): Exit Context table contains
           107 entries
    INFO: dict.c(1100):       4708 possible cross word triphones.
    INFO: dict.c(1166):       4240 triphones
           424 pseudo diphones
            44 uniphones
    INFO: dict.c(1168):       1949 right context entries
    INFO: dict.c(1169):         18 ave entries per exit context
    INFO: dict.c(355): RIGHT CONTEXT TABLES
    INFO: dict.c(1013): Entry Context table contains
            93 entries
    INFO: dict.c(1014):       4092 possible cross word triphones.
    INFO: dict.c(1052):       3822 triphones
           182 pseudo diphones
            88 uniphones
    INFO: dict.c(1099): Exit Context table contains
            93 entries
    INFO: dict.c(1100):       4092 possible cross word triphones.
    INFO: dict.c(1166):       3822 triphones
           182 pseudo diphones
            88 uniphones
    INFO: dict.c(1168):       2082 right context entries
    INFO: dict.c(1169):         22 ave entries per exit context
    INFO: ngram_model_arpa.c(539): ngrams 1=135, 2=260, 3=262
    INFO: ngram_model_arpa.c(204): Reading unigrams
    INFO: ngram_model_arpa.c(578):      135 = #unigrams created
    INFO: ngram_model_arpa.c(260): Reading bigrams
    INFO: ngram_model_arpa.c(594):      260 = #bigrams created
    INFO: ngram_model_arpa.c(595):       32 = #prob2 entries
    INFO: ngram_model_arpa.c(602):       23 = #bo_wt2 entries
    INFO: ngram_model_arpa.c(358): Reading trigrams
    INFO: ngram_model_arpa.c(615):      262 = #trigrams created
    INFO: ngram_model_arpa.c(616):       17 = #prob3 entries
    INFO: ngram_search_fwdtree.c(156): 0 root, 0 non-root channels, 29 single-phone words
    INFO: ngram_search_fwdtree.c(195): Creating search tree
    INFO: ngram_search_fwdtree.c(203): 0 root, 0 non-root channels, 29 single-phone words
    INFO: ngram_search_fwdtree.c(325): max nonroot chan increased to 408
    INFO: ngram_search_fwdtree.c(334): 103 root, 280 non-root channels, 9 single-phone words
    INFO: ngram_search_fwdflat.c(95): fwdflat: min_ef_width = 4, max_sf_win = 25
    ad_oss.c 258: can't set input gain/recording level for this device.
    INFO: continuous.c(261): pocketsphinx_continuous COMPILED ON: Jan 20 2010, AT: 18:37:36
    
    FATAL_ERROR: "continuous.c", line 135: cont_ad_calib failed
    
     
  • Nickolay V. Shmyrev

    It tells you it can't record from your device. It might be input muted or
    permission problem for example.

     
  • Anonymous

    Anonymous - 2010-01-29

    /dev/dsp was:
    crw-r--r-- 1 root root 14, 3 Mar 20 2007 /dev/dsp
    and I chmod to 664, and change group to the users group (not that it should
    matter):
    crw-rw-r-- 1 root users 14, 3 Mar 20 2007 /dev/dsp

    Still the same error.

    Here is the USB headset/mic in /proc/bus/usb/devices:

    T: Bus=02 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0
    D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
    P: Vendor=046d ProdID=0a0c Rev=10.13
    S: Manufacturer=Logitech
    S: Product=Logitech USB Headset
    S: SerialNumber=c0a5d6b2d3b6fede9ccf9abad2b1d4a0
    C: #Ifs= 4 Cfg#= 1 Atr=80 MxPwr=100mA
    I:
    If#= 0 Alt= 0 #EPs= 0 Cls=01(audio) Sub=01 Prot=00 Driver=snd-usb-audio
    I: If#= 1 Alt= 0 #EPs= 0 Cls=01(audio) Sub=02 Prot=00 Driver=snd-usb-audio
    I: If#= 1 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=00 Driver=snd-usb-audio
    E: Ad=01(O) Atr=0d(Isoc) MxPS= 192 Ivl=1ms
    I: If#= 1 Alt= 2 #EPs= 1 Cls=01(audio) Sub=02 Prot=00 Driver=snd-usb-audio
    E: Ad=01(O) Atr=0d(Isoc) MxPS= 96 Ivl=1ms
    I:
    If#= 2 Alt= 0 #EPs= 0 Cls=01(audio) Sub=02 Prot=00 Driver=snd-usb-audio
    I: If#= 2 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=00 Driver=snd-usb-audio
    E: Ad=84(I) Atr=0d(Isoc) MxPS= 96 Ivl=1ms
    I:* If#= 3 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=00 Prot=00 Driver=usbhid
    E: Ad=83(I) Atr=03(Int.) MxPS= 2 Ivl=8ms

     
  • Nickolay V. Shmyrev

    Headset USB microphone is most likely something like /dev/dsp1. Try to record
    audio with any recording application like audacity, you'll easily find the
    correct device.

     
  • Anonymous

    Anonymous - 2010-01-30

    nope thats not it, my distro only comes with /dev/dsp. Even when i add
    /dev/dsp1 manually. The system only uses that if I have two USB audio devices
    plugged in ( I have no PCI audio devices). I've tried cvoicecontrol and its
    buggy, but was able to pick up the mic input on my headset on /dev/dsp. I am
    unable to install audacity because I don't have a monitor on this server, and
    don't have ALSA.

    I was able to get it to work with a plain USB mic, but not with the headset:

    /usr/local/bin/pocketsphinx_continuous -lm ./../data/pocketsphinx/current.lm
    -dict ./../data/pocketsphinx/current.dic -hmm
    /usr/local/share/pocketsphinx/model/hmm/wsj1 -samprate 8000 -adcdev /dev/dsp

     
  • Nickolay V. Shmyrev

    You probalby shouldn't bother much with OSS bugs, unix audio still sucks after
    years. Try to record in a file and decode with pocketsphinx_batch. I think it
    will be enough for experiments.

     
  • Sandeep Verma

    Sandeep Verma - 2012-03-31

    I want to develop an application using "cmusphinx" for convert waves(mp3,video
    audio ) to text and vice versa using PHP

    any kind of help is appreciated.

    Thanks

     
  • Nickolay V. Shmyrev

    I want to develop an application using "cmusphinx" for convert
    waves(mp3,video audio ) to text and vice versa using PHP

    You can invoke pocketsphinx_continuous binary from php and store text results.
    It's relatively simple to implement.

     

Log in to post a comment.