Menu

SphinxII: audio file other than goforward.16k

Help
bentckao
2005-11-01
2012-09-22
  • bentckao

    bentckao - 2005-11-01

    I am new to SphinxII. I am trying to get the system to decode audio files I recorded with my own voice say the same thing " go forward 10 meters".

    1. I had
      ./configure
      make clean all
      make test
      make install

    Looks like ./sphinx2-test produces good results at the BESTPATH line.
    INFO: search.c(2568): 783 candidate words for entering last phone (1/fr)
    SFrm EFrm AScr/Frm AScr PathScr BSDiff LatDen PhPerp Word (Bestpath) (goforward)
    ------------------------------------------------------------------------
    63 76 349174 4888443 17438237 -141750 8 1.19 GO
    86 117 290375 9292029 26999968 -167232 2 0.89 FORWARD
    125 143 282989 5376800 32448889 -153653 5 1.05 TEN
    148 194 231355 10873712 43394722 -159068 3 1.11 METERS
    INFO: searchlat.c(939): BESTPATH: GO FORWARD TEN METERS (goforward -97741494)

    1. I had recorded with 16000 Hz, mono, 16 bit, signed.
      ampliude from -0.3 to +0.8, approx 4 seconds. File name mygoforward.16k. This file plays well using both linux play and Goldwave.
      /usr/bin/play ./mygoforward.16k -f s -r 16000 -w -t raw
      127926 is the size which I calculated works out right for a 4 seconds clip.

    I had changed the turtle.ctl for this new file mygoforward

    1. Then I
      ./configure
      make clean all;make test

    The new file name mygoforward seems recognised but then I am not able to recognize the same words utter by my own voice?

    INFO: uttproc.c(1382): Samples histogram (mygoforward) (4/8/16/30/32K): 96.3%(61589) 2.9%(1858) 0.8%(503) 0.0%(13) 0.0%(0); max: 26083
    5.138 = AGC MAX
    SFrm Efrm AScr/Frm AScr LScr BSDiff LatDen PhPerp Word (FWDTREE) (mygoforward)
    ---------------------------------------------------------------------
    0 48 -190493 -9334195 0 -180307 2 1.00 <s>
    49 83 -210710 -7374863 -52986 -183413 2 0.89 SIL
    84 117 -224194 -7622609 -52986 -150829 5 1.22 SIL
    118 147 -239993 -7199818 -52986 -145032 11 1.29 SIL
    148 164 -225240 -3829088 -52986 -190644 5 1.13 SIL
    165 192 -218652 -6122267 -52986 -192475 3 1.27 SIL
    193 238 -220551 -10145381 -52986 -161658 5 1.77 SIL
    239 259 -213408 -4481585 -52986 -160426 5 1.96 SIL
    260 279 -211849 -4236995 -52986 -174694 3 1.46 SIL
    280 369 -178657 -16079189 -52986 -175035 1 0.47 SIL
    370 397 -188428 -5276006 -212437 -172178 1 0.79 </s>
    INFO: search.c(2643): FWDTREE: (mygoforward -82391307 (A=-81701996 L=-689311))

    I thought Sphinx is speaker independent?
    Do I need to train?

    Thanks
    bentckao@hotmail.com

     
    • bentckao

      bentckao - 2005-11-03

      I had overlooked the Open Source Tutorial which explained how to set up the tutorial and train.
      I am able to finish off the tutorial on AN1

      Now my challenge is to convert my own wav files into the various model files for recognization.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.