Menu

Help with finding documentation on PockietSphinx-Python Methods please?

Help
2020-04-20
2020-04-25
  • James B Ross

    James B Ross - 2020-04-20

    I have PocketSphinx installed on a small SBC running Ubuntu 18.04. I have also installed PocketSphinx-Python and I have been able to run some example python code I found including one that allows me to use microphone input.

    However, I would like to write my own Python programs to access pocketsphinx, but I don't know where to find the methods used by pocketsphinx-python or how to use them.

    For example there is a method called "decode"

    In the example program they are using decode.end_uut()

    I have no idea what this does other than it seems to print out a lot of information to the terminal and decode anything I say into the microphone. I might add that pocketsphinx has been decoding my speech at a very good accuracy so far.

    However, I would like to gain more control over how my Python program interacts with pocketsphinx.

    My Python IDE has intellisene and when I type in decoder. it offers up a very long list of options that can be chosen.

    Where can I find information on what all these options do?

    Is there any documentaion for the methods I can use via PocketSphinx-Python?

    I'm building a Linguistic AI system and I 'm using pocketsphinx as the speech recognizer for the input. So I'd like to get the results of the decoding process into variable as eifficiently as possible. All I need to know is what was said into the microphone.

    Thank you.

     
    • Nickolay V. Shmyrev

      Try https://github.com/alphacep/vosk-api, it is much more accurate.

       
  • James B Ross

    James B Ross - 2020-04-20

    Other people have pointed me to vosk as well. The problem is that I haven't been able to find much information about vosk. I don't even know what it is, or how to use it?

    Does vosk use pocketsphinx?

    If not, what exactly is vosk? And where do I find detailed information on it beyond that github page, especially in terms of tutorials.?

    I'm already having close to 100% accuracy with Pocket Sphinx. It's been decoding everything I throw at it with near perfection. Perhaps it likes the way I speek?

    Also, if I move over to vosk It also appears to be dependent on Kaldi?

    What's Kaldi?

    I need total indepence from Internet. Is Vosk and Kaldi Internet independent?

    Do they have dictionaries that I can modify like PocketSphinx has?

    And will my original question even be answered? Will I be able to find detailed information on the methods I can use in Python when using vosk-Kaldi?

    I don't want to have to install a whole new system only to discover that I'll have the same questions about how to access it efficiently from Python.

    Is CMUSphinx dead?

    If so why isn't this CMUSphinx group proclaimed to be obsolete or dead, and we aren't all just being pointed over to vosk-Kaldi?

    This is supposed to be a CMUSphinx forum. Is CMUSphinx gargage now?

     
    • Nickolay V. Shmyrev

      I don't even know what it is, or how to use it?

      You are welcome to ask,

      Does vosk use pocketsphinx?

      No

      If not, what exactly is vosk? And where do I find detailed information on it beyond that github page, especially in terms of tutorials.?

      It is a software library to recognize speech just like pocketsphinx.

      I'm already having close to 100% accuracy with Pocket Sphinx. It's been decoding everything I throw at it with near perfection. Perhaps it likes the way I speek?

      If it is perfect already, what are you asking about then?

      Also, if I move over to vosk It also appears to be dependent on Kaldi? What's Kaldi?

      Kaldi is speech recognition toolkit.

      I need total indepence from Internet. Is Vosk and Kaldi Internet independent?

      Yes, you do not need internet.

      Do they have dictionaries that I can modify like PocketSphinx has?

      Yes

      And will my original question even be answered? Will I be able to find detailed information on the methods I can use in Python when using vosk-Kaldi?

      Sure, it is all in the sources and demos.

      I don't want to have to install a whole new system only to discover that I'll have the same questions about how to access it efficiently from Python.

      Ok, it is up to you.

      Is CMUSphinx dead?

      Yes.

      If so why isn't this CMUSphinx group proclaimed to be obsolete or dead, and we aren't all just being pointed over to vosk-Kaldi?

      You are pointed over to vosk-kaldi.

       
  • James B Ross

    James B Ross - 2020-04-20

    "If it is perfect already, what are you asking about then?"

    I was asking for information on how to access and control pocketsphinx from Python. I wasn't complaining that it doesn't decode speech well.

    And I have a feeling that if I move over to vosk-kaldi I'll end up having the same questions. And I will have basically gained nothing.

    But hey, I'll give it a shot.

    The thing that is so disusting is that I have already spend about 2 weeks learning all about pocketsphinx. Now I'll need to start all over from scratch learning about vosk-kaldi.

    And I've already done searchers for tutorials on vosk and kaldi and I haven't found much. Do they have a tutorial page like CMUSphinx has? At leat CMUSphinx provided quite a bit of documentation. https://cmusphinx.github.io/wiki/tutorial/

    This feels like going back to square one moving over to vosk-kaldi. I hope it's worth it.

     
    • Nickolay V. Shmyrev

      This feels like going back to square one moving over to vosk-kaldi. I hope it's worth it.

      Absolutely! Let me know if you have further questions.

       
  • James B Ross

    James B Ross - 2020-04-21

    Ok, I have a problem right off the bat:

    I'm running Ubuntu 18.04 on a Jetson Nano arm64.

    From the GitHub page I tried the foillowing:

    james@james-desktop:~/vosk-kaldi$ pip3 --version
    pip 20.0.2 from /home/james/.local/lib/python3.6/site-packages/pip (python 3.6)
    (I have the correct verion of pip)

    james@james-desktop:~/vosk-kaldi$ pip3 install vosk
    Defaulting to user installation because normal site-packages is not writeable
    ERROR: Could not find a version that satisfies the requirement vosk (from versions: none)
    ERROR: No matching distribution found for vosk

    (so then I tried the following:

    james@james-desktop:~/vosk-kaldi$ python3 -m pip install vosk
    Defaulting to user installation because normal site-packages is not writeable
    ERROR: Could not find a version that satisfies the requirement vosk (from versions: none)
    ERROR: No matching distribution found for vosk

    (Same error)

    Just for the record I was able to install pocketbase, pocketsphinx , and pocketsphinx-python on this machine with no problems. And it's all working.

     
    • Nickolay V. Shmyrev

      Try

       pip3 install https://github.com/alphacep/vosk-api/releases/download/0.3.3/vosk-0.3.3-cp36-cp36m-linux_aarch64.whl
      
       
  • James B Ross

    James B Ross - 2020-04-21

    Ok, that seemed to work.

    james@james-desktop:~/vosk-kaldi$ pip3 install https://github.com/alphacep/vosk-api/releases/download/0.3.3/vosk-0.3.3-cp36-cp36m-linux_aarch64.whl
    Defaulting to user installation because normal site-packages is not writeable
    Collecting vosk==0.3.3
    Downloading https://github.com/alphacep/vosk-api/releases/download/0.3.3/vosk-0.3.3-cp36-cp36m-linux_aarch64.whl (2.5 MB)
    |████████████████████████████████| 2.5 MB 4.9 kB/s
    Installing collected packages: vosk
    Successfully installed vosk-0.3.3
    james@james-desktop:~/vosk-kaldi$

    I have another question. I found this page: http://kaldi-asr.org/doc/index.html

    That appears to have a lot of information on it about Kaldi. But there's no mention of vosk anywhere.

    What exactly is vosk, and why is it needed? Isn't Kaldi suppoosed to be the SRE?

     
    • Nickolay V. Shmyrev

      What exactly is vosk, and why is it needed? Isn't Kaldi suppoosed to be the SRE?

      If you want to simply use speech recognizer from python, you can use vosk prepackaged wheels and models. Kaldi is more a system for speech researchers with complex install, api and usage.

       
  • James B Ross

    James B Ross - 2020-04-23

    I'm not getting anywhere with vosk.
    I've installed vosk and Kaldi.

    I'm trying to run the following python text code for vosk

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    #!/usr/bin/python3
    
    from vosk import Model, KaldiRecognizer
    import sys
    import os
    import wave
    
    if not os.path.exists("model-en"):
        print ("Please download the model from https://github.com/alphacep/kaldi-android-demo/releases and unpack as 'model-en' in the current folder.")
        exit (1)
    
    wf = wave.open(sys.argv[1], "rb")
    if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE":
        print ("Audio file must be WAV format mono PCM.")
        exit (1)
    
    model = Model("model-en")
    rec = KaldiRecognizer(model, wf.getframerate())
    
    while True:
        data = wf.readframes(1000)
        if len(data) == 0:
            break
        if rec.AcceptWaveform(data):
            print(rec.Result())
        else:
            print(rec.PartialResult())
    
    print(rec.FinalResult())
    

    And I get the following erros:

    kaldi/test_simple.py~/vosk-kaldi/kaldi$ /usr/bin/python3 /home/james/vosk-kaldi/k
    Traceback (most recent call last):
      File "/home/james/vosk-kaldi/kaldi/test_simple.py", line 3, in <module>
        from vosk import Model, KaldiRecognizer
      File "/home/james/.local/lib/python3.6/site-packages/vosk/__init__.py", line 1, in <module>
        from .vosk import KaldiRecognizer, Model, SpkModel
      File "/home/james/.local/lib/python3.6/site-packages/vosk/vosk.py", line 13, in <module>
        from . import _vosk
    **ImportError: libgfortran.so.3: cannot open shared object file: No such file or directory**
    

    Is there anyone still alive who can answer questions about Pocket Sphinx.? I'd rather be using pocketsphinx to be honest. It was looking really promising for my specific project.

    In terms of not being able to find any help, I'm finding that vosk isn't any better. I can't find any information on vosk at all beyond the github page which has very limited information.

     
    • Nickolay V. Shmyrev

      ImportError: libgfortran.so.3: cannot open shared object file: No such file or directory

      You need to install libfortran.so.3 with sudo apt-get install libgfortran3

      Is there anyone still alive who can answer questions about Pocket Sphinx.? I'd rather be using pocketsphinx to be honest. It was looking really promising for my specific project.

      You are welcome to ask.

       
  • James B Ross

    James B Ross - 2020-04-23

    I did install libgfortran as a dependency Kaldi.

    I'll install libgfortran3 and see if that helps.

    Although I'm probably going to have as many questions about vosk as I have for pocketsphinx.

    The first thing I'll want to do is empty its dictionary and start my own dictionary from scratch. I already know how to do that for pocketsphinx.

    Currently I have pocketsphinx running. And it's decoding my speech close to 100% accuracy. In fact, most of the time it is 100% accurate. As I say, it must like my voice.

    The things I need to know are little things, like how to stop it from printing out all the INFO: and debug statements on the command line. It basically prints out several pages of what it's doing with the decided speech somewhere in the middle. I'd like to be able to turn off those printouts. It's nice to have them when needed, but I don't need them when everything is working properly.

    Here's my Python Code for Pocket Sphinx:

    # This program is working!
    # Needed to set the microphone input correctly on the desktop mic app.
    # Saturday April 18th.
    import subprocess as cmdLine
    from os import environ, path
    import pyaudio
    from pocketsphinx.pocketsphinx import *
    from sphinxbase.sphinxbase import *
    cmdLine.call('clear')
    
    MODELDIR = "pocketsphinx/model"
    
    config = Decoder.default_config()
    config.set_string('-hmm', path.join(MODELDIR, 'en-us/en-us'))
    config.set_string('-lm', path.join(MODELDIR, 'en-us/en-us.lm.bin'))
    config.set_string('-dict', path.join(MODELDIR, 'en-us/cmudict-en-us.dict'))
    decoder = Decoder(config)
    
    p = pyaudio.PyAudio()
    stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
    stream.start_stream() 
    
    in_speech_bf = False
    decoder.start_utt()
    Result = ""
    while True:
        buf = stream.read(1024)
        if buf:
            decoder.process_raw(buf, False, False)
            if decoder.get_in_speech() != in_speech_bf:
                in_speech_bf = decoder.get_in_speech()
                if not in_speech_bf:
                    decoder.end_utt()
                    Result = decoder.hyp().hypstr
                    print (Result)
                    break
        else:
            break
    print ("Decoded Speech: {0}".format(Result))
    

    And all I want it to do is print out the final decoded speech.

    But it prints out all of the following:

    INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from pocketsphinx/model/en-us/en-us/feat.params
    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -allphone
    -allphone_ci no no
    -alpha 0.97 9.700000e-01
    -ascale 20.0 2.000000e+01
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-48
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -ceplen 13 13
    -cmn live batch
    -cmninit 40,3,-1 41.00,-5.29,-0.12,5.09,2.48,-4.07,-1.37,-1.78,-5.08,-2.05,-6.45,-1.42,1.17
    -compallsen no no
    -debug 0
    -dict pocketsphinx/model/en-us/cmudict-en-us.dict
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm pocketsphinx/model/en-us/en-us
    -input_endian little little
    -jsgf
    -keyphrase
    -kws
    -kws_delay 10 10
    -kws_plp 1e-1 1.000000e-01
    -kws_threshold 1 1.000000e+00
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lifter 0 22
    -lm pocketsphinx/model/en-us/en-us.lm.bin
    -lmctl
    -lmname
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.300000e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf 30000 30000
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 25
    -nwpen 1.0 1.000000e+00
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-10 1.000000e-10
    -pl_pip 1.0 1.000000e+00
    -pl_weight 3.0 3.000000e+00
    -pl_window 5 5
    -rawlogdir
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec 0-12/13-25/26-38
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy dct
    -unit_area yes yes
    -upperf 6855.4976 6.800000e+03
    -uw 1.0 1.000000e+00
    -vad_postspeech 50 50
    -vad_prespeech 20 20
    -vad_startspeech 10 10
    -vad_threshold 2.0 2.000000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02

    INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
    INFO: acmod.c(166): Using subvector specification 0-12/13-25/26-38
    INFO: mdef.c(518): Reading model definition: pocketsphinx/model/en-us/en-us/mdef
    INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
    INFO: bin_mdef.c(336): Reading binary model definition: pocketsphinx/model/en-us/en-us/mdef
    INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
    INFO: tmat.c(149): Reading HMM transition probability matrices: pocketsphinx/model/en-us/en-us/transition_matrices
    INFO: acmod.c(117): Attempting to use PTM computation module
    INFO: ms_gauden.c(127): Reading mixture gaussian parameter: pocketsphinx/model/en-us/en-us/means
    INFO: ms_gauden.c(242): 42 codebook, 3 feature, size:
    INFO: ms_gauden.c(244): 128x13
    INFO: ms_gauden.c(244): 128x13
    INFO: ms_gauden.c(244): 128x13
    INFO: ms_gauden.c(127): Reading mixture gaussian parameter: pocketsphinx/model/en-us/en-us/variances
    INFO: ms_gauden.c(242): 42 codebook, 3 feature, size:
    INFO: ms_gauden.c(244): 128x13
    INFO: ms_gauden.c(244): 128x13
    INFO: ms_gauden.c(244): 128x13
    INFO: ms_gauden.c(304): 222 variance values floored
    INFO: ptm_mgau.c(476): Loading senones from dump file pocketsphinx/model/en-us/en-us/sendump
    INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
    INFO: ptm_mgau.c(563): Rows: 128, Columns: 5126
    INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
    INFO: ptm_mgau.c(838): Maximum top-N: 4
    INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
    INFO: dict.c(320): Allocating 138824 * 32 bytes (4338 KiB) for word entries
    INFO: dict.c(333): Reading main dictionary: pocketsphinx/model/en-us/cmudict-en-us.dict
    INFO: dict.c(213): Dictionary size 134723, allocated 1016 KiB for strings, 1679 KiB for phones
    INFO: dict.c(336): 134723 words read
    INFO: dict.c(358): Reading filler dictionary: pocketsphinx/model/en-us/en-us/noisedict
    INFO: dict.c(213): Dictionary size 134728, allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(361): 5 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
    INFO: dict2pid.c(132): Allocated 42672 bytes (41 KiB) for word-final triphones
    INFO: dict2pid.c(196): Allocated 42672 bytes (41 KiB) for single-phone word triphones
    INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
    INFO: ngram_search_fwdtree.c(74): Initializing search tree
    INFO: ngram_search_fwdtree.c(101): 791 unique initial diphones
    INFO: ngram_search_fwdtree.c(186): Creating search channels
    INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 152609
    INFO: ngram_search_fwdtree.c(333): Created 723 root, 152481 non-root channels, 53 single-phone words
    INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.front.0:CARD=0'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM front
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.surround51.0:CARD=0'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround21
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.surround51.0:CARD=0'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround21
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.surround40.0:CARD=0'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround40
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.surround51.0:CARD=0'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround41
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.surround51.0:CARD=0'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround50
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.surround51.0:CARD=0'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround51
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.surround71.0:CARD=0'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround71
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM iec958
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM spdif
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM spdif
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
    ALSA lib pcm_dmix.c:990:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
    ALSA lib pcm_dmix.c:1052:(snd_pcm_dmix_open) unable to open slave
    ALSA lib pcm_params.c:2162:(snd1_pcm_hw_refine_slave) Slave PCM not usable
    ALSA lib pcm_params.c:2162:(snd1_pcm_hw_refine_slave) Slave PCM not usable
    ALSA lib pcm_params.c:2162:(snd1_pcm_hw_refine_slave) Slave PCM not usable
    ALSA lib pcm_params.c:2162:(snd1_pcm_hw_refine_slave) Slave PCM not usable
    ALSA lib pcm_dmix.c:1052:(snd_pcm_dmix_open) unable to open slave
    Cannot connect to server socket err = No such file or directory
    Cannot connect to server request channel
    jack server is not running or cannot be started
    JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
    JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
    INFO: ngram_search.c(459): Resized backpointer table to 10000 entries
    INFO: ngram_search.c(467): Resized score stack to 200000 entries
    INFO: cmn_live.c(120): Update from < 41.00 -5.29 -0.12 5.09 2.48 -4.07 -1.37 -1.78 -5.08 -2.05 -6.45 -1.42 1.17 >
    INFO: cmn_live.c(138): Update to < 51.82 -0.02 -12.76 7.46 -4.59 -1.31 1.28 -0.43 1.27 -0.31 1.46 6.54 -1.84 >
    INFO: ngram_search_fwdtree.c(1550): 6515 words recognized (17/fr)
    INFO: ngram_search_fwdtree.c(1552): 1118489 senones evaluated (2967/fr)
    INFO: ngram_search_fwdtree.c(1556): 5693477 channels searched (15102/fr), 200745 1st, 257404 last
    INFO: ngram_search_fwdtree.c(1559): 13944 words for which last channels evaluated (36/fr)
    INFO: ngram_search_fwdtree.c(1561): 414142 candidate words for entering last phone (1098/fr)
    INFO: ngram_search_fwdtree.c(1564): fwdtree 7.28 CPU 1.932 xRT
    INFO: ngram_search_fwdtree.c(1567): fwdtree 7.76 wall 2.058 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 202 words
    INFO: ngram_search_fwdflat.c(948): 2710 words recognized (7/fr)
    INFO: ngram_search_fwdflat.c(950): 279330 senones evaluated (741/fr)
    INFO: ngram_search_fwdflat.c(952): 348757 channels searched (925/fr)
    INFO: ngram_search_fwdflat.c(954): 19506 words searched (51/fr)
    INFO: ngram_search_fwdflat.c(957): 12693 word transitions (33/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.50 CPU 0.133 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.52 wall 0.138 xRT
    INFO: ngram_search.c(1250): lattice start node .0 end node .330
    INFO: ngram_search.c(1276): Eliminated 2 nodes before end node
    INFO: ngram_search.c(1381): Lattice has 645 nodes, 1119 links
    INFO: ps_lattice.c(1380): Bestpath score: -10877
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:330:375) = -768052
    INFO: ps_lattice.c(1441): Joint P(O,S) = -794637 P(S|O) = -26585
    INFO: ngram_search.c(872): bestpath 0.01 CPU 0.002 xRT
    INFO: ngram_search.c(875): bestpath 0.01 wall 0.002 xRT
    hello my name is james
    Decoded Speech: hello my name is james
    INFO: ngram_search_fwdtree.c(429): TOTAL fwdtree 7.28 CPU 1.937 xRT
    INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 7.76 wall 2.064 xRT
    INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.50 CPU 0.133 xRT
    INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.52 wall 0.138 xRT
    INFO: ngram_search.c(303): TOTAL bestpath 0.01 CPU 0.002 xRT
    INFO: ngram_search.c(306): TOTAL bestpath 0.01 wall 0.002 xRT
    james@james-desktop:~/speech_recognition/pocketsphinx-python$

    Notice near the bottom it prints:

    hello my name is james
    Decoded Speech: hello my name is james

    That's 100% perfectly decoded speech.

    And it printed both those lines because my program printed those lines.

    How do I shut off all the other messages?

    I'd really like to find documentation on the Pocket Sphinx source code so I can see what the methods I call actualy do, and what parameters I can send them to do things like telling them not to print messages.

    I have pocketsphinx running. I like it. I just want to have more control over it from Python, that's all. Simple little things like telling it to quit printing debug message.

     
    • Nickolay V. Shmyrev

      How do I shut off all the other messages?

      add config.set_string('-logfn', '/dev/null'))

      see also

      https://stackoverflow.com/questions/17825820/how-do-i-turn-off-e-info-in-pocketsphinx

      I'd really like to find documentation on the Pocket Sphinx source code so I can see what the methods I call actualy do, and what parameters I can send them to do things like telling them not to print messages.

      There is C documentation here. https://cmusphinx.github.io/doc/pocketsphinx/files.html. There is no Python documentation, just the source code.

      I have pocketsphinx running. I like it. I just want to have more control over it from Python, that's all. Simple little things like telling it to quit printing debug message.

      Ok!

       
  • James B Ross

    James B Ross - 2020-04-23

    Ok I finally got VOSK running, but as I suspected I have simlar questions with vosk:

    Here's my Python code:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    #!/usr/bin/python3
    
    from vosk import Model, KaldiRecognizer
    import time
    import os
    os.system('clear')
    time.sleep(1)
    
    if not os.path.exists("model-en"):
        print ("Please download the model from https://github.com/alphacep/kaldi-android-demo/releases and unpack as 'model-en' in the current folder.")
        exit (1)
    
    import pyaudio
    
    p = pyaudio.PyAudio()
    stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8000)
    stream.start_stream()
    
    model = Model("model-en")
    rec = KaldiRecognizer(model, 16000)
    
    Decoded_speech = ""
    print("STARTING HERE:")
    while True:
        data = stream.read(2000)
        if len(data) == 0:
            break
        if rec.AcceptWaveform(data):
            Decoded_speech = rec.Result()
            # print(rec.Result())
            break
        else:
            pass
            # Decoded_speech =rec.PartialResult()
    
    print("My Print Statement: = {0}".format(Decoded_speech))
    print("Done")
    

    And here's the output:

    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.front.0:CARD=0'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM front
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.surround51.0:CARD=0'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround21
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.surround51.0:CARD=0'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround21
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.surround40.0:CARD=0'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround40
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.surround51.0:CARD=0'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround41
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.surround51.0:CARD=0'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround50
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.surround51.0:CARD=0'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround51
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.surround71.0:CARD=0'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM surround71
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM iec958
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM spdif
    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.tegra-hda.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2'
    ALSA lib conf.c:4528:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
    ALSA lib conf.c:5007:(snd_config_expand) Evaluate error: No such file or directory
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM spdif
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
    ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
    ALSA lib pcm_dmix.c:990:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
    ALSA lib pcm_dmix.c:1052:(snd_pcm_dmix_open) unable to open slave
    ALSA lib pcm_params.c:2162:(snd1_pcm_hw_refine_slave) Slave PCM not usable
    ALSA lib pcm_params.c:2162:(snd1_pcm_hw_refine_slave) Slave PCM not usable
    ALSA lib pcm_params.c:2162:(snd1_pcm_hw_refine_slave) Slave PCM not usable
    ALSA lib pcm_params.c:2162:(snd1_pcm_hw_refine_slave) Slave PCM not usable
    ALSA lib pcm_dmix.c:1052:(snd_pcm_dmix_open) unable to open slave
    Cannot connect to server socket err = No such file or directory
    Cannot connect to server request channel
    jack server is not running or cannot be started
    JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
    JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
    vosk --min-active=200 --max-active=3000 --beam=10.0 --lattice-beam=2.0 --acoustic-scale=1.0 --frame-subsampling-factor=3 --endpoint.silence-phones=1:2:3:4:5:6:7:8:9:10 --endpoint.rule2.min-trailing-silence=0.5 --endpoint.rule3.min-trailing-silence=1.0 --endpoint.rule4.min-trailing-silence=2.0
    LOG (vosk[5.5.643~3-7e185]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
    LOG (vosk[5.5.643~3-7e185]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
    LOG (vosk[5.5.643~3-7e185]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 1 orphan nodes.
    LOG (vosk[5.5.643~3-7e185]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 2 orphan components.
    LOG (vosk[5.5.643~3-7e185]:Collapse():nnet-utils.cc:1472) Added 1 components, removed 2
    LOG (vosk[5.5.643~3-7e185]:CompileLooped():nnet-compile-looped.cc:345) Spent 0.163441 seconds in looped compilation.
    STARTING HERE:
    My Print Statement: = {
    "result" : [{
    "conf" : 1.000000,
    "end" : 3.330000,
    "start" : 3.090000,
    "word" : "i'm"
    }, {
    "conf" : 1.000000,
    "end" : 3.660000,
    "start" : 3.330000,
    "word" : "having"
    }, {
    "conf" : 1.000000,
    "end" : 3.750000,
    "start" : 3.660000,
    "word" : "the"
    }, {
    "conf" : 1.000000,
    "end" : 4.200000,
    "start" : 3.750000,
    "word" : "same"
    }, {
    "conf" : 1.000000,
    "end" : 4.860000,
    "start" : 4.260000,
    "word" : "problems"
    }, {
    "conf" : 1.000000,
    "end" : 5.070000,
    "start" : 4.860000,
    "word" : "with"
    }, {
    "conf" : 1.000000,
    "end" : 5.790000,
    "start" : 5.070000,
    "word" : "control"
    }, {
    "conf" : 1.000000,
    "end" : 6.270000,
    "start" : 5.820000,
    "word" : "here"
    }]
    ,
    "text" : "i'm having the same problems with control here"
    }
    Done
    james@james-desktop:~/vosk-kaldi/kaldi$

    All the ALSA lib comments are coming from Pyaudio. So I'll, need to look into how I can turn those off.

    But then vosk prints a bunch of comments before starting to capture speech. It doesn't actually start capturing speech until I've printed out "STARTING HERE"

    Then I'm getting information from every word it detects along with the "text": where it finally gives me what was actually said in quotes.

    Do I then need to dig that result out of there to put it into a string I can actualy use?

    Isn't there a way to just get the final result and move on to working with that?

    I'm, basically right back to where I was with Pocket Spinkx. I mean, I could make this work, but it's kind of ugly. Surely there are cleaner ways to work with this? I just want an SRE that's goign to report what was said without printing out a bunch of unnecessary comments to the terminal.

    Is there any documented source code for Vosk?

     
    • Nickolay V. Shmyrev

      All the ALSA lib comments are coming from Pyaudio. So I'll, need to look into how I can turn those off.

      You need to cleanup the alsa config, see https://stackoverflow.com/questions/7088672/pyaudio-working-but-spits-out-error-messages-each-time

      Do I then need to dig that result out of there to put it into a string I can actualy use?

      It is json, you can parse it with json.loads:

      import json
      result = json.loads(rec.Result())
      text = result['text']

      Is there any documented source code for Vosk?

      No

       
  • James B Ross

    James B Ross - 2020-04-23

    "There is C documentation here. https://cmusphinx.github.io/doc/pocketsphinx/files.html. There is no Python documentation, just the source code."

    This will be very helpful. Thank you!

    Thanks for the information on alsa config, and json too.

    Sorry to hear that there's no documentation for vosk.

    How's a person supposed to learn it? Just keep coming here and asking questions?

    I just want to iron out a few things so I can use an SRE for my Linquistic AI project.

    I like Pocket Sphinx because I understand how to modify its dictionary.

    I have no clue how to modify the dictionary vosk uses.

    My Linguistic AI projects is all about building a dictionary from the ground up. So I really need to start with an empty dictionary. Something I can then have my AI software build from the ground up as it learns new words.

    vosk does look like a better SRE, but I need it to be as flexible as Pocket Sphinx was in terms of building the dictionary.

    So little information, and no tutorials. That's a bummer! I'd think there should be some good tutorials around on how to use and modify vosk.

     
    • Nickolay V. Shmyrev

      How's a person supposed to learn it? Just keep coming here and asking questions?

      Yes. Many people can read code too, it is the best documentation.

      So little information, and no tutorials. That's a bummer! I'd think there should be some good tutorials around on how to use and modify vosk.

      Eventually there will be some tutorials but for now the speed of the development of the technology makes it very hard to create extensive documentation. You can check also

      https://groups.google.com/d/topic/kaldi-help/3LBSzmploC0/discussion

       
  • James B Ross

    James B Ross - 2020-04-24

    It can be difficult to get a simple answer to a question.

    I've got vosk installed and running on a computer. Vosk only. Not Kaldi. Although Vosk apparently comes with some form of a Kaldirecognitzer.

    My question is where do I find the vocabulary dictionary, and how do I modfied it?
    And is it easy to do?

    With pocketsphinx it is extremely easy to modify the dictionary or even created a new dictionary from scratch. And that was the main attraction to pocketsphinx. I need to be able to create a dictionary from scratch and be able to modify it and add to it on the fly easily and programmatically. This is extremely easy to do with pocketsphinx.

    Currently with vosk I have no idea where to even find the vocabulary dictionary much less how to modify it or swap it out. In my Linguistic AI project I need the ability to create several different dictionaries for special cases and be able to swap between them programmatically on the fly.

    As I say, this is very easy to do with pocketsphinx. But I have no clue how to do this with vosk.

    Where can I find information on how to create specialized dictionaries for vosk, and how difficult is it to do?

    I'm thinking this may be the deciding factor of whether I use vosk or pocketsphinx for my project. If it's not easy to work with the vosk vocabulary dictionary, then it's not going to be of much use in my project. So the answer to this question is paramount for my application.

    And a simple answer like "Yes it can be easily done" is not a useful answer. A useful answer is to point to information on exactly how to do it.

    Without knowing how to do it knowing that it's supposedly easy to do isn't very helpful.

     
  • James B Ross

    James B Ross - 2020-04-25

    Thanks for all the information Nickolay. You've been very helpful. Vosk does seem to be a very good SRE, but the dictionary modifications appears to be too complex for my purpose. So I'm going to need to go back to using pocketsphinx just for the ease-of-use of its dictionary.

    In any case, thanks for your patience and time. You've been very helpful.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.