Menu

PocketSphinx Diff in recognition of keyword between command line and python

Help
kp ks
2019-10-24
2019-10-24
  • kp ks

    kp ks - 2019-10-24

    Hi,
    I am using the sample code in Python to recognize recorded message. I get no recognition.

    However if I try the same phrase from the command line, I get the following output:

    !pocketsphinx_continuous -infile file1.wav -hmm /Users/naka/anaconda3/lib/python3.6/site-packages/pocketsphinx/model/en-us -kws_threshold 1e-20 -keyphrase "tea we" -time yes -logfn /dev/null
    
    tea we tea we
    tea we 4.840 5.030 0.826032
    tea we 4.510 4.680 0.835336
    tea we
    tea we 19.500 19.690 0.837511
    tea we
    tea we 39.550 39.700 0.822898
    

    Example code for keyword recogntion:

    #not matching command line output
    import sys, os
    from pocketsphinx import *
    from sphinxbase import *
    
    
    modeldir = get_model_path()
    datadir = "../../../test/data"
    print(modeldir)
    # Create a decoder with certain model
    config = Decoder.default_config()
    config.set_string('-hmm', os.path.join(modeldir, 'en-us'))
    #config.set_string('-dict', os.path.join(modeldir, 'cmudict-en-us.dict'))
    config.set_string('-keyphrase', 'tea we')
    config.set_float('-kws_threshold', 1e+20)
    
    
    # Open file to read the data
    stream = open("file1.wav", "rb")
    
    # Alternatively you can read from microphone
    # import pyaudio
    # 
    # p = pyaudio.PyAudio()
    # stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
    # stream.start_stream()
    
    # Process audio chunk by chunk. On keyphrase detected perform action and restart search
    decoder = Decoder(config)
    decoder.start_utt()
    while True:
        buf = stream.read(1024)
        if buf:
            decoder.process_raw(buf, False, False)
        else:
             break
        if decoder.hyp() != None:
            print ([(seg.word, seg.prob, seg.start_frame, seg.end_frame) for seg in decoder.seg()])
            print ("Detected keyphrase, restarting search")
            decoder.end_utt()
            decoder.start_utt()
    

    What am I setting up wrong? The Threshld is ie20 in both cases

    ALso if you could point me to documentation to understand the output firmat from command line, I would be most grateful!

     
    • Nickolay V. Shmyrev

      In command line you have threshold 1e-20, in python code 1e+20, those are different.

       
      • kp ks

        kp ks - 2019-10-24

        OOps! I changed it but there is still no recognition. WOuld u also help me understand the output in the command line? I assume it is the timestamp on the audio while between whiich the keyword is found and then the confidence level?

        #not matching command line output
        import sys, os
        from pocketsphinx import *
        from sphinxbase import *
        
        
        modeldir = get_model_path()
        datadir = "../../../test/data"
        print(modeldir)
        # Create a decoder with certain model
        config = Decoder.default_config()
        config.set_string('-hmm', os.path.join(modeldir, 'en-us'))
        #config.set_string('-dict', os.path.join(modeldir, 'cmudict-en-us.dict'))
        config.set_string('-keyphrase', 'tea we')
        config.set_float('-kws_threshold', 1e-20)
        
        
        # Open file to read the data
        stream = open("file1.wav", "rb")
        
        # Alternatively you can read from microphone
        # import pyaudio
        # 
        # p = pyaudio.PyAudio()
        # stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
        # stream.start_stream()
        
        # Process audio chunk by chunk. On keyphrase detected perform action and restart search
        decoder = Decoder(config)
        decoder.start_utt()
        while True:
            buf = stream.read(1024)
            if buf:
                decoder.process_raw(buf, False, False)
            else:
                 break
            if decoder.hyp() != None:
                print ([(seg.word, seg.prob, seg.start_frame, seg.end_frame) for seg in decoder.seg()])
                print ("Detected keyphrase, restarting search")
                decoder.end_utt()
                decoder.start_utt()
        
         
        • Nickolay V. Shmyrev

          I changed it but there is still no recognition.

          You can share the file to get help on this

          WOuld u also help me understand the output in the command line? I assume it is the timestamp on the audio while between whiich the keyword is found and then the confidence level?

          Yes

           
  • kp ks

    kp ks - 2019-10-24

    The file for which the command line works is attached .
    I am trying to recognise the phrase "TVS Way 20000"

     
    • Nickolay V. Shmyrev

      The file is 8khz, it is not going to work with 16khz model. Also you need en-in model, not en-us. Third, for such task it is better to use large vocabulary transcription and analyze the results.

       
      • kp ks

        kp ks - 2019-10-24

        Thanks for your insight.
        I had actually used ffmpeg to specifically convert 8k into 16k files for this task. I have to dig nto why that is not working.
        eg SYntax used for conversion:

        ffmpeg -i testWav1.wav -ac 1 -ar 16000 testWay.wav
        

        Not al the conversation will be in ENglish. It could be in any of 22 distinct languages. The nly guarenteed english phrase, so to say, is TVS Way and the number. Is there any alternative to complete transcription? 90% of the conv cannot be recognised!

         
  • kp ks

    kp ks - 2019-10-24

    Hi, I had previously installed indian language, not much diff in recognition:
    !pocketsphinx_continuous -infile file1.wav -hmm /Users/naka/anaconda3/lib/python3.6/site-packages/pocketsphinx/model/en_in.cd_cont_5000 -dict /Users/naka/anaconda3/lib/python3.6/site-packages/pocketsphinx/model/en_in.dic -samprate 16000.0 -kws_threshold 1e-20 -keyphrase "tea we" -time yes -logfn /dev/null

    However, the sampling rate of 8Hz results in no recognition. COmmand line as always, works with 16Hz correctly. Python didnt work with either

    tea we
    tea we 2.620 2.810 0.827686
    tea we tea we
    tea we 4.830 5.030 0.835503
    tea we 4.510 4.640 0.824051
    tea we tea we tea we
    tea we 8.500 9.350 0.848811
    tea we 7.990 8.280 0.840953
    tea we 7.230 7.390 0.826527
    tea we tea we tea we tea we tea we
    tea we 14.330 14.510 0.828017
    tea we 13.980 14.220 0.826941
    tea we 11.240 11.450 0.824711
    tea we 10.890 11.080 0.833584
    tea we 10.500 10.650 0.826114
    tea we tea we tea we tea we
    tea we 20.760 20.920 0.829343
    tea we 20.090 20.270 0.825949
    tea we 19.080 19.370 0.845845
    tea we 18.600 18.800 0.840112
    tea we
    tea we 23.040 23.200 0.826197
    tea we
    tea we 25.050 25.760 0.898245
    tea we tea we tea we
    tea we 29.130 29.320 0.822898
    tea we 28.050 28.310 0.854176
    tea we 27.020 27.300 0.843311
    tea we tea we
    tea we 39.790 40.120 0.828680
    tea we 39.350 39.620 0.840784
    tea we tea we tea we
    tea we 43.870 44.130 0.829177
    tea we 43.640 43.760 0.828265
    tea we 42.660 43.200 0.861468
    

    I also tested the file with afinfo:

    File:           file1.wav
    File type ID:   WAVE
    Num Tracks:     1
    ----
    Data format:     1 ch,  16000 Hz, 'lpcm' (0x0000000C) 16-bit little-endian signed integer
                    no channel layout.
    estimated duration: 45.056000 sec
    audio bytes: 1441792
    audio packets: 720896
    bit rate: 256000 bits per second
    packet size upper bound: 2
    maximum packet size: 2
    audio data file offset: 78
    optimized
    source bit depth: I16
    
     

    Last edit: kp ks 2019-10-24

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.