Menu

decoding with sphinx3_livepretend

Help
xavic383
2008-04-02
2012-09-22
  • xavic383

    xavic383 - 2008-04-02

    Hi,

    I've trained acoustic models with SphinxTrain and I've decode with livepretend command. By the execution of livepretend, I get a filed called "log.txt", where the results has the following type (where conlavenia01 is the name of the audio file tested):

    Backtrace(conlavenia01)
    FV:conlavenia01> WORD SFrm EFrm AScr(UnNorm) LMScore AScr+LScr AScale
    fv:conlavenia01> <sil> 0 63 -21372439 -74100 -21446539 -3118963
    FV:conlavenia01> TOTAL -21372439 -74100

    FWDVIT: (conlavenia01)
    FWDXCT: conlavenia01 S -3166552 T -21380540 A -21372439 L -8101 0 -21372439 -8101 <sil> 64

    INFO: stat.c(154): 64 frm; 9 cdsen/fr, 54 cisen/fr, 10 cdgau/fr, 79 cigau/fr, Sen 0.01, CPU 0.01 Clk [Ovrhd 0.01 CPU 0.01 Clk]; 6 hmm/fr, 1 wd/fr, Search: -0.00 CPU 0.00 Clk (conlavenia01)
    INFO: corpus.c(647): conlavenia01: 0.0 sec CPU, 0.0 sec Clk; TOT: 0.0 sec CPU, 0.0 sec Clk

    INFO: main_livepretend.c(142): PARTIAL_HYP:
    INFO: main_livepretend.c(142): PARTIAL_HYP:
    INFO: main_livepretend.c(142): PARTIAL_HYP:
    INFO: cmn_prior.c(121): cmn_prior_update: from < 19.87 -1.49 -0.44 -0.39 -0.28 -0.27 -0.24 -0.22 -0.20 -0.20 -0.19 -0.20 -0.19 >
    INFO: cmn_prior.c(139): cmn_prior_update: to < 19.87 -1.50 -0.44 -0.39 -0.28 -0.26 -0.24 -0.23 -0.21 -0.20 -0.19 -0.21 -0.19 >
    INFO: agc.c(172): AGCEMax: obs= 0.27, new= 4.17
    INFO: fast_algo_struct.c(398): HMMHist0..0: 71(100)
    INFO: lm.c(951): 0 tg(), 0 tgcache, 0 bo; 0 fills, 0 in mem (0.0%)
    INFO: lm.c(955): 5 bg(), 5 bo; 0 fills, 3 in mem (42.9%)

    My problem is that I don't understand the meaning of the parameters that I get, such as SFrm, EFrm, AScr(UnNorm), LMScore, AScr+LScr , AScale, FWDVIT, FWDXCT,...

    Anyway, I would like to know if there's any other ways to perform recognition, so if I want to build an application based on Sphinx, I guess that livepretend is not the best way to do so...

    Thanks !!!

     
    • Nickolay V. Shmyrev

      > 1) What samples do you need ?

      Samples you are training model on.

      > 2) Why is my database incorrectly trained ? Would it be because there's a small amount of data ?

      No idea yet, but it doesn't recognize your test utterance. Most likely you made a mistake. Small amount of data is also a problem of course.

      > 3) My purpose is to create a speech recognition system for castilian-spanish language.

      There are spanish phonetic dictionaries as well as rules. Spanish LTS rules are rather precise too unlike english ones.

      > 4) I've read in the FAQ, that I need to specify the size of my speech corpus in hours. Why do I have to do it and where do I have to specify it ?

      There is no such statement in the FAQ, reread it again and prove me I'm wrong here.

      > 5) Any other consideration ??

      Depends on the subject we must consider :)

       
    • Nickolay V. Shmyrev

      SFrm - start frame (time in 1/100 seconds)
      EFrm - end frame
      AScr - acoustic score (used during search)
      LMScr - language model score
      AScale - rescoring cofficient used during search.
      FWDVIT: (conlavenia01) - hypothesis (silence in your case)
      FWDXCT: conlavenia01 S -3166552 T -21380540 A -21372439 L -8101 0 -21372439 -8101 <sil> 64 - more information about match

      For more information check the FAQ:

      http://www.speech.cs.cmu.edu/sphinxman/FAQ.html#21

      Your problem is that you incorrectly trained the database and in options you are using. For example there is no sense to use -agc emax, it will only make recognition worse. For more details we need samples actually.

       
    • xavic383

      xavic383 - 2008-04-03

      Hi everybody,
      I still have some questions related:

      1) What samples do you need ?

      2) Why is my database incorrectly trained ? Would it be because there's a small amount of data ?

      3) My purpose is to create a speech recognition system for castilian-spanish language. Then, how can I know the correct transcription for every word from my dictionary ? I'm not a phonetist...

      4) I've read in the FAQ, that I need to specify the size of my speech corpus in hours. Why do I have to do it and where do I have to specify it ?

      5) Any other consideration ??

      Thanks a lot !!

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.