Menu

Help begginer to choose direction to take.

cornelyus
2010-09-07
2012-09-22
<< < 1 2 3 (Page 3 of 3)
  • cornelyus

    cornelyus - 2010-11-12

    Hello.. hope you got the files..

    3 questions not regarding that issue..

    1. went to the tutorial for adaptation again, when :
      ./bw \
      -hmmdir hub4wsj_sc_8k \
      -moddeffn hub4wsj_sc_8k/mdef.txt \
      -ts2cbfn .semi. \
      -feat 1s_c_d_dd \
      -svspec 0-12/13-25/26-38 \
      -cmn current \
      -agc none \
      -dictfn arctic20.dic \
      -ctlfn arctic20.fileids \
      -lsnfn arctic20.transcription \
      -accumdir .

    "Make sure the arguments here match the parameters in feat.params file inside
    the acoustic model folder."

    so if using hub4wsj_sc_8k model, i should make sure the command is :

    ./bw \
    -hmmdir hub4wsj_sc_8k \
    -moddeffn hub4wsj_sc_8k/mdef.txt \
    -ts2cbfn .semi. \
    -nfilt 20
    -lowerf 1
    -upperf 4000
    -wlen 0.025
    -transform dct
    -round_filters no
    -remove_dc yes
    -svspec 0-12/13-25/26-38
    -feat 1s_c_d_dd
    -agc none
    -cmn current
    -cmninit 56,-3,1
    -varnorm no
    -dictfn arctic20.dic \
    -ctlfn arctic20.fileids \
    -lsnfn arctic20.transcription \
    -accumdir .

    because i used the exact one from the tutorial website..

    1. What's the difference for adapting to sphinx4? Is it to download the file
      hub4opensrc.cd_continuous_8gau , and use it instead of hub4wsj_sc_8k?

    2. If i adapt the acoustic model for myself with arctic.. i can improve this
      acoustic adaptation by using this adapted folder, and add more recordings of
      my voice?

     
  • cornelyus

    cornelyus - 2010-11-12

    regarding issue number1, i shouldn't have asked before trying it myself..
    sorry for that.. i see now that bw doesn't have some parameters that are in
    the feat.parameters file.. so i think i should use the command exactly as it
    is on the tutorial page..

     
  • Nickolay V. Shmyrev

    regarding issue number1, i shouldn't have asked before trying it myself..
    sorry for that.. i see now that bw doesn't have some parameters that are in
    the feat.parameters file.. so i think i should use the command exactly as it
    is on the tutorial page..

    I updated wiki page anyway

    What's the difference for adapting to sphinx4? Is it to download the file
    hub4opensrc.cd_continuous_8gau , and use it instead of hub4wsj_sc_8k?

    See http://cmusphinx.sourceforge.net/wiki/tutorialadapt#other_acoustic_models

    You can use hub4 or wsj which goes with sphinx4 distribution

    If i adapt the acoustic model for myself with arctic.. i can improve this
    acoustic adaptation by using this adapted folder, and add more recordings of
    my voice?

    Yes, you can

     
  • Nickolay V. Shmyrev

    As for your adaptation, there are several issues here both yours and from
    pocketsphinx, for example

    1) Audio files you recorded shouldn't have a lot of silence, you need to trim
    the silence in the beginning and in the end
    2) Recent pocketsphinx require you to use language weight about 1.0 with jsgf
    3) There are other bugs

    If you'll do everything properly you'll get the following result:

    TOTAL Words: 102 Correct: 102 Errors: 2
    TOTAL Percent correct = 100.00% Error = 1.96% Accuracy = 98.04%
    TOTAL Insertions: 2 Deletions: 0 Substitutions: 0
    Insertions: 0 Deletions: 0 Substitutions: 0
    TOTAL Words: 102 Correct: 102 Errors: 1
    TOTAL Percent correct = 100.00% Error = 0.98% Accuracy = 99.02%
    TOTAL Insertions: 1 Deletions: 0 Substitutions: 0
    

    You can check download my sample scripts here

    http://rapidshare.com/files/430472798/adapt_test_cornelius.tar.gz

     
  • Nickolay V. Shmyrev

    The bug was fixed in pocketsphinx today, if you'll update to trunk and run
    pocketsphinx with -fwdflat no -bestpath no, you'll get best results even
    without adaptation:

    TOTAL Words: 102 Correct: 102 Errors: 0
    TOTAL Percent correct = 100.00% Error = 0.00% Accuracy = 100.00%
    TOTAL Insertions: 0 Deletions: 0 Substitutions: 0
    
     
  • cornelyus

    cornelyus - 2010-11-15

    See http://cmusphinx.sourceforge.net/wiki/tutorialadapt#other_acoustic_mode
    ls

    You can use hub4 or wsj which goes with sphinx4 distribution

    I got the sphinx4 distribution.. are you talking about these files?
    WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz.jar
    WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar

    As for your adaptation, there are several issues here both yours and from
    pocketsphinx, for example

    1) Audio files you recorded shouldn't have a lot of silence, you need to trim
    the silence in the beginning and in the end
    2) Recent pocketsphinx require you to use language weight about 1.0 with jsgf
    3) There are other bugs

    1. thanks, wil take this into consideration..
    2. what's this "language weight" parameter?

    The bug was fixed in pocketsphinx today, if you'll update to trunk and run
    pocketsphinx with -fwdflat no -bestpath no, you'll get best results even
    without adaptation:

    " -fwdflat " and "-bestpath"
    those are the kind of parameters i am yet to understand what they "control"..
    care to explain?
    all i got was :
    -fwdflat
    Run forward flat-lexicon search over word lattice (2nd pass)
    -bestpath
    Run bestpath (Dijkstra) search over word lattice (3rd pass)

    Can thank you enough for your time to attend this subject... hope to repay it
    in the future..

     
  • Nickolay V. Shmyrev

    I got the sphinx4 distribution.. are you talking about these files?
    WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz.jar
    WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar

    Yes

    1. what's this "language weight" parameter?

    -lw

    " -fwdflat " and "-bestpath" those are the kind of parameters i am yet to
    understand what they "control".. care to explain?

    pocketsphinx analyzes audio multiple times to get best recognition result.
    After first approximation is calculated, pocketsphinx tries to recognize all
    the words that where detected again, then in the result of the second pass it
    tries to find best combination of words. This improves accuracy on a large
    vocabulary. For small grammar like yours it's not needed.

     
  • cornelyus

    cornelyus - 2010-11-15

    does that mean a faster recognition cycle?

     
  • cornelyus

    cornelyus - 2010-11-15

    btw.. did what you recommended and got same results for before adaptation (98)
    and got the same after adaptation ( 98 instead of 99 like you ) don't
    understand why..

    Got to 100% with last version and those parameters you told...

     
  • Nickolay V. Shmyrev

    does that mean a faster recognition cycle?

    Yes

    don't understand why.

    I don't understand why as well, you can check adapted model in my files. But
    it doesn't matter I suppose.

     
  • cornelyus

    cornelyus - 2010-11-15

    I don't understand why as well, you can check adapted model in my files. But
    it doesn't matter I suppose.

    got the sphinxbase + pockesphinx snapshot .. compiled both.. did
    pocketsphinx_batch \
    -hmm hub4wsj_sc_8k_adapt \
    -dict lm/cmu07a.dic \
    -jsgf cards.gram \
    -ctl transcription.fileids \
    -adcin yes \
    -cepext .wav \
    -cepdir wav \
    -wip 0.1 \
    -lw 1 \
    -hyp after_example.hyp

    and am getting 98%.. weird :(

    with -fwdflat no -bestpath no get 100% like you...

     
  • cornelyus

    cornelyus - 2010-11-15

    unpacked your files, and ran the adapt.sh script.. result.. the same 98% for
    no adaptation and adaptation...although 98% seems pretty good , i find awkward
    i can't get same results as you did..

    -in pockesphinx_continuous i can use -fwdflat and -bestpath right?!

    -was re-reading past posts and you mention

    Many issues needs to be implemented like noise skipping.

    how can I try implementing noise skipping? Or is this going deep on the code?

    -what tool did you use to "cut" the silence on the beginning and end of my wav files?

     
  • cornelyus

    cornelyus - 2010-11-15

    forgot to ask previously..
    I am on a windows host, and using virtualbox with linux ubuntu...
    on sound preferences i can see my microphone input, but when running
    pocketsphinx_continuous this error appears:
    "ad_oss.c(103): Failed to open audio device(/dev/dsp): No such file or
    directory" ..

    when doing

     arecord -l
    

    i get this :
    "* List of CAPTURE Hardware Devices *
    card 1: Device , device 0: USB Audio
    Subdevices: 1/1
    Subdevice #0: subdevice #0"

    Can't I refer the microphone with -adcdev? maybe this is a limitation of using
    linux on a virtual machine?

    thank you

     
  • cornelyus

    cornelyus - 2010-11-16

    Regarding last issue, was able to solve it... must have alsa drivers
    installed.
    Must add USB sound device on virtual machine settings.. and all set it
    seems...

    Don't wanna be pushing harder but need to ask this...
    Now tried pocketsphinx_continuous with the parameters talked before ( -lw,
    -fwdflat, -bestpath) . First impressions, it does seem faster to decode...
    But am I losing data on the output? Because before I had info about
    fsg_search.c and ps_lattice.c

    example :

    INFO: cmn_prior.c(121): cmn_prior_update: from < 48.20 -4.79  0.51  0.66 -0.83
    1.18 -1.27  0.87 -0.93  1.14 -1.04  0.53 -0.62 >
    INFO: cmn_prior.c(139): cmn_prior_update: to   < 45.29 -5.28  0.35  0.71 -0.85
    1.39 -1.31  1.00 -0.78  0.98 -0.83  0.47 -0.53 >
    INFO: fsg_search.c(1026): 426 frames, 44157 HMMs (103/fr), 48666 senones (114/fr
    ), 20759 history entries (48/fr)
    
    INFO: fsg_search.c(1403): Start node ACE.0:5:7
    INFO: fsg_search.c(1403): Start node TWO.0:5:12
    INFO: fsg_search.c(1403): Start node EIGHT.0:5:14
    INFO: fsg_search.c(1403): Start node <sil>.0:2:30
    INFO: fsg_search.c(1442): End node <sil>.354:358:425 (-1014)
    INFO: fsg_search.c(1658): lattice start node <s>.0 end node <sil>.354
    INFO: ps_lattice.c(1351): Normalizer P(O) = alpha(<sil>:354:425) = -2119991
    INFO: ps_lattice.c(1389): Joint P(O,S) = -2153008 P(S|O) = -33017
    000000004: ACE OF SPADES ACE OF CLUBS ACE OF HEARTS
    

    Also, i was using before

     prob = ps_get_prob(ps, &uttid); 
            conf = logmath_exp(ps_get_logmath(ps), prob);
    

    to get confidence on recognized utterance... but now with ( -lw 1, -fwdflat
    no, -bestpath no) confidence this way seems to be always 1 (??!!)

    Thought i could get a confidence percentage, and eliminate utterances that
    were recognized below a threshold defined ( for example 50%)...

    Thanks again..

     
  • Nickolay V. Shmyrev

    how can I try implementing noise skipping?

    There are many algorithms for noise cancellation you can find something small
    in google and implement it.

    Or is this going deep on the code?

    Not really

    -what tool did you use to "cut" the silence on the beginning and end of my
    wav files?

    Audacity

    to get confidence on recognized utterance... b

    Posteriour confidence only have sense with a large vocabulary. You have two
    choices:

    1. Implement phone-based confidence like in sphinx2
    2. Don't use confidence at all.
     
  • cornelyus

    cornelyus - 2010-11-16

    Hey..

    before reading your post read on the API this

    "Get posterior probability.

    Note:
    Unless the -bestpath option is enabled, this function will always return zero
    (corresponding to a posterior probability of 1.0). Even if -bestpath is
    enabled, it will also return zero when called on a partial result. Ongoing
    research into effective confidence annotation for partial hypotheses may
    result in these restrictions being lifted in future versions."

    1.Any documentation for that "phone-base confidence" of sphinx2?

    I'll check for those noise cancellation algorithms you are talking about...

     
  • cornelyus

    cornelyus - 2010-11-16

    tried pocketsphinx_continuous

    pocketsphinx_continuous.exe -hmm hub4wsj_sc_8k_adapt -dict cmu07a.dic -jsgf
    cards.gram -lw 1 -fwdflat no -bestpath no

    Even though it identifies what i am saying, it recognizes a lot of "garbage"
    :S.. while i was talking to a friend, it was picking random stuff i was
    saying, and without confidence level to measure, i can't eliminate these
    utterances... kind of lost here now..

     
  • cornelyus

    cornelyus - 2010-11-16
    1. Implement phone-based confidence like in sphinx2

    any documentation to know where to look?

     
  • cornelyus

    cornelyus - 2010-11-16

    when searching thru the forums found this:
    https://sourceforge.net/projects/cmusphinx/forums/forum/5471/topic/3400265

    seems i am crashing on the same problem... any news concerning that issue?

    on a side note.. can't i edit previous posts here?

     
  • Nickolay V. Shmyrev

    any documentation to know where to look?

    No, there is no documentation, only sphinx2 sources

    on a side note.. can't i edit previous posts here?

    No, you can't

     
  • cornelyus

    cornelyus - 2010-11-16

    will "dig in" on sphinx2 sources..

    any tips on the other questions?

    Once again..thanks for your time..

     
<< < 1 2 3 (Page 3 of 3)

Log in to post a comment.