Menu

Any body known well about the FSG?

Help
chris
2007-05-25
2012-09-22
  • chris

    chris - 2007-05-25

    I wrote a fsg file like this:
    FSG_BEGIN
    N 7
    S 0
    F 4

    T 0 1 0.9 go
    T 1 2 0.5 forward
    T 1 5 0.5 backward
    T 2 3 0.5 ten
    T 3 4 0.5 meters
    T 5 6 0.5 two
    T 6 4 0.5 meters

    FSG_END

    for the audio data from the demo package goforward.16k,it works well but when I use the audio
    file which I recorded,it is very bad,when I said 'go forward ten meters',sometimes it is recogonize some words(not all of the whole sentences),when I said 'go backward two meters',non of them can be recogonized.Does any body know about this?

    otherwise,I am looking for a detailed document about how to write FSG file,anybody have?

     
    • chris

      chris - 2007-07-05

      ok,I will try,thank you very very much.

       
    • Nickolay V. Shmyrev

      Have you tried existing tidigits fsg from sphinx3?

      http://cmusphinx.svn.sourceforge.net/viewvc/cmusphinx/trunk/sphinx3/model/hmm/tidigits/

      Can you paste complete output from sphinx2 you are running?

       
    • chris

      chris - 2007-05-29

      Oh,thank you very much,I will have a try.

       
    • Anonymous

      Anonymous - 2007-06-01

      Chris -- Which Sphinx are you trying to use with your FSG? How is it configured, and what acoustic model are you using?

      I do not have any experience with writing Sphinx FSGs, but I've read the very minimal documentation in http://cmusphinx.sourceforge.net/sphinx2/doc/sphinx2.html#sec_fsgfmt , and I looked at the examples that Nicolay cited, and your example looks reasonable, and I think it should work -- it permits the two sentences GO FORWARD TEN METERS and GO BACKWARD TWO METERS. (AFAIK, the usual practice is for the transition probabilities for all paths leaving a node to sum to 1.0, but your values should work as well.)

      But let's step back from the question of FSG format and look at the observations you have reported.
      1. The demo audio file goforward.16k works -- I assume that means it produces a correct recognition of GO FORWARD TEN METERS.
      2. But when you tried it with audio files that you recorded (both 'go forward ten meters' and 'go backward two meters'), it doesn't work, or work well.

      1 suggests that the demo file is consistent with your Sphinx configuration, the acoustic model, and the FSG, since the utterance was successfully recognized. #2 says that your audio files didn't work, so we should inquire as to how your audio files are different from the demo file.

      First of all, are they the same sample rate and format as the demo file goforward.16k? (As I recall, that's 16K samples/sec, raw format.)

      If your files are consistent, then is the recorded audio clear and undistorted (have you listened to them)?

      Is the speech in them consistent with the acoustic model? For example, if the acoustic model was trained from American English, is your speech the same (I don't know if you are a native speaker of English)?

      There's not much anyone can say about your poor recognitions without seeing the complete output from the recognizer. Can you post this? And what's the Sphinx configuration?

      cheers,
      jerry

       
    • chris

      chris - 2007-06-04

      Hi Jerry,
      Thank you very much for replaying.
      My sphinx2 works successfully now,I run the sphinx_batch to recogonize and use the parameter according to the sphinx-text,I change some parameter the parameter list:

      /usr/local/sphinx2/bin/sphinx2_batch \ -adcin TRUE \ -adcext 16k \ -ctlfn data/file.ctl \ -ctloffset 0 \ -datadir audio \ -agcmax TRUE \ -langwt 6.5 \ -fwdflatlw 8.5 \ -rescorelw 9.5 \ -ugwt 0.5 \ -fillpen 1e-12 \ -silpen 0.002 \ -inspen 0.65 \ -top 4 \ -topsenfrm 4 \ -topsenthresh \ -70000 \ -beam 0 \ -npbeam 0 \ -lpbeam 0 \ -lponlybeam 0 \ -nwbeam 0 \ -fwdflat TRUE \ -fwdflatbeam 1e-08 \ -fwdflatnwbeam 0.0001 \ -fsgfn data/eval.fsg \ -dictfn data/eval.dict \ -ndictfn data/model/hub4/sphinx_2_format/noisedict \ -phnfn data/model/hub4/sphinx_2_format/phone \ -mapfn data/model/hub4/sphinx_2_format/map \ -hmmdir data/model/hub4/sphinx_2_format \ -hmmdirlist data/model/hub4/sphinx_2_format \ -sendumpfn data/model/hub4/sphinx_2_format/sendump \ -cbdir data/model/hub4/sphinx_2_format \ -8bsen TRUE \ -bestpath TRUE \ -fsgbfs TRUE \ -fsgusealtpron FALSE \ -fsgusefiller FALSE \ -compressprior FALSE \ -compress FALSE \ -compallsen TRUE \ -latsize 5000 \ -normmean TRUE \ -maxhmmpf

      maybe it is not the best one,I still tune the performance,now the sentense rate error is about 10% to 20%.

      I use hub4 acoustic model,I even write a script(python) to generate the FSG File,and just build a small dictionary which just contain the words appears in the grammar.
      I am not a native English speaker,I am from China,I think that is why it is not very accurate.
      I think I should try to train the model myself.

      by the way I use sox to convert the audio file,as:
      sox input_file.wav -s -w -r 16000 -t sph output_file.16k

      I hope it could help others who has the same problem as me.
      If you think some thing wrong with my configuration,can you just tell me?
      Thank you very much again!

      Regards
      Chris

       
      • Anonymous

        Anonymous - 2007-06-09
        1. For non-native English speech with an American English acoustic model, a 10-20% sentence error rate may be quite reasonable, depending on the size of your grammar.

        2. Changing the sample rate to 16 kHz to match the model is the correct thing to do, assuming that the original rate is greater than 16 kHz. (This will not be satisfactory if the original rate is less.) I believe that Sox has 3 rate-changing methods, but I do not remember which is the best one. Be sure to use the best method.

        3. Sorry, I have not used Sphinx2 for several years, so I cannot judge your configuration.

        cheers,
        jerry

         
    • chris

      chris - 2007-06-20

      I met another problem.
      I want to use fsg for sphinx3,but it seems it does not work,I use batch model,and the parameter list is :
      -mdef /tmp/sphinx3test/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/hub4opensrc.6000.mdef \ -fdict /tmp/sphinx3test/share/sphinx3/model/lm/an4/filler.dict \ -fsg data/eval.fsg \ -ctl data/eval.ctl \ -dict data/eval.dict \ -cepdir audio \ -mean /tmp/sphinx3test/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/means \ -var /tmp/sphinx3test/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/variances \ -mixw /tmp/sphinx3test/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/mixture_weights \ -tmat /tmp/sphinx3test/share/sphinx3/model/hmm/hub4_cd_continuous_8gau_1s_c_d_dd/transition_matrices \ -maxwpf 1 \ -beam 1e-40 \ -pbeam 1e-30 \ -wbeam 1e-20 \ -maxhmmpf 1500

      the last information I got is
      INFO: srch.c(447): Search Initialization.
      INFO: srch.c(724): lmset is NULL and vithist is NULL in op_mode OP_TST_DECODE, wrong operation mode?
      FATAL_ERROR: "kb.c", line 352: Search initialization failed. Forced exit

      lmset means language model?it can't use fsg?but the instruction said I can use it.anybody know about this?

       
      • Nickolay V. Shmyrev

        You should also use an option to point decoder mode: -mode 2

         
    • Nickolay V. Shmyrev

      To be more precise:

      http://cmusphinx.sourceforge.net/sphinx3/doc/s3_description.html

      All of the decoding routines could be accessible under the executable sphinx3_decode through using the -op_mode options. (-op_mode 2: FST, -op_mode 3: Flat Lexicon Decoder, -op_mode: Tree Lexicon Decoder) The original flat-lexicon decoder interface still exists for backward compatibility purpose.

       
    • chris

      chris - 2007-06-21

      thank you very much,I will try it now

       
    • chris

      chris - 2007-06-21

      thank you,it is working,now.
      can you tell me how to generate mfc file from wav file?I searched the whole morning,I still have no clue,thank you very much.

       
      • Nickolay V. Shmyrev

        Use sphinx_fe program. Decoder should read wav files perfectly though.

         
    • chris

      chris - 2007-06-29

      I just use raw2cep to convert sphinx2 sph files to mfc files,I will try sphinx_fe,where can I find it ?
      thanks a lot.

       
      • Nickolay V. Shmyrev

        it's a part of sphinxbase - sphinxbase/src/sphinx_fe

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.