Menu

Continuous on Android

Help
2011-06-25
2012-09-22
1 2 > >> (Page 1 of 2)
  • Andre Natal

    Andre Natal - 2011-06-25

    In the final release of pocketsphinx, it is , or not it is, possible to do
    continuous recognition ?
    There is some way to do it as openars do it for iphone?

    Thanks

    Andre

     
  • Nickolay V. Shmyrev

    it is , or not it is, possible to do continuous recognition ?

    No, current implementation doesn't let you do that. You need to implement
    wrappers specific for continuous recognition.

    There is some way to do it as openars do it for iphone?

    You need to wrap proper functions in Android Java code.

     
  • Andre Natal

    Andre Natal - 2011-06-27

    Can you please only tell me which I need?

    Thanks

     
  • Andre Natal

    Andre Natal - 2011-06-28

    Should I compile another library and write a full new wrapper, or
    libpocketsphinx.so is enough and I should only add new wrapper to swig ?

    Thanks

    Andre

     
  • Nickolay V. Shmyrev

    Hello Andre

    You can use existing pocketsphinx functions to implement continuous
    recognition. First task you need to solve is to record the audio continuously
    on Android, this is an android-specific task. Then you can feed this audio to
    pocketsphinx as it's done in pocketsphinx_continuous. You will have to wrap
    several functions of pocketsphinx API in swig to make them accessible from
    Android Java code.

     
  • Andre Natal

    Andre Natal - 2011-06-29

    Nikolay,
    I am already recording the audio continuously on Android. What I need, it is
    an example of pocketsphinx_continuous consume in c or whatever, so I can know
    which functions I should wrap on swig.

    Thanks

    Andre

     
  • Andre Natal

    Andre Natal - 2011-06-30

    Humm... doesnt like too difficult to port it ....
    I will start it. Do you have notice about some big challeging on doing this?
    Do you know if anybody did it yet?

    Thanks

    Andre

     
  • Nickolay V. Shmyrev

    Do you have notice about some big challeging on doing this?

    The challenge will start afterwards the porting will be done. There are tweaks
    needed to provide good confidence measure of the recognition results and you
    will have to implement noise cancellation.

    Do you know if anybody did it yet?

    No, I'm not aware about such thing released to public

     
  • Andre Natal

    Andre Natal - 2011-06-30

    And if I record only the voice between silence and use pocketsphinx to
    recognize from a recorded file?
    Is this possible?

     
  • Nickolay V. Shmyrev

    Yes

     
  • Andre Natal

    Andre Natal - 2011-07-19

    Nykolay,

    Shouldn´t I include ad and cont_ad C source references on Android.mk?

    I think they aren´t being included on shared libray defaulf compilation.
    Didn´t see any references to it on this file.

    Thanks

     
  • Andre Natal

    Andre Natal - 2011-07-19

    Nikolay,

    you mean if I detect silence and only pass the array of bytes to already
    wrapped processRaw, it works?
    Because the only functions in pocketsphinx_continuous that is needed to wrap
    is ad and cont_ad, and, as these ones is needed to detect silence and handle
    recording, maybe the ones already 'swiged' , it works.
    As my analysis, only ad_open_dev , cont_ad_init , ad_start_rec, cont_ad_calib
    and cont_ad_read is needed to wrap.

    static void
    230 recognize_from_microphone()
    231 {
    232 ad_rec_t ad;
    233 int16 adbuf;
    234 int32 k, ts, rem;
    235 char const
    hyp;
    236 char const uttid;
    237 cont_ad_t
    cont;
    238 char word;
    239
    240 if ((ad = ad_open_dev(cmd_ln_str_r(config, "-adcdev"),
    241 (int)cmd_ln_float32_r(config, "-samprate"))) == NULL)
    242 E_FATAL("Failed top open audio device\n");
    243
    244 / Initialize continuous listening module /
    245 if ((cont = cont_ad_init(ad, ad_read)) == NULL)
    246 E_FATAL("Failed to initialize voice activity detection\n");
    247 if (ad_start_rec(ad) < 0)
    248 E_FATAL("Failed to start recording\n");
    249 if (cont_ad_calib(cont) < 0)
    250 E_FATAL("Failed to calibrate voice activity detection\n");
    251
    252 for (;;) {
    253 / Indicate listening for next utterance /
    254 printf("READY....\n");
    255 fflush(stdout);
    256 fflush(stderr);
    257
    258 / Wait data for next utterance /
    259 while ((k = cont_ad_read(cont, adbuf, 4096)) == 0)
    260 sleep_msec(100);
    261
    262 if (k < 0)
    263 E_FATAL("Failed to read audio\n");
    264
    265 /
    266 * Non-zero amount of data received; start recognition of new utterance.
    267 * NULL argument to uttproc_begin_utt => automatic generation of utterance-
    id.
    268
    /
    269 if (ps_start_utt(ps, NULL) < 0)
    270 E_FATAL("Failed to start utterance\n");
    271 ps_process_raw(ps, adbuf, k, FALSE, FALSE);

     
  • Nickolay V. Shmyrev

    you mean if I detect silence and only pass the array of bytes to already
    wrapped processRaw, it works?

    Yes, you can do voice activity detection yourself and pass raw bytes. This
    solution might be more straightforward than the one to use cont_ad API which
    is sort of too complex

    As my analysis, only ad_open_dev , cont_ad_init , ad_start_rec,
    cont_ad_calib and cont_ad_read is needed to wrap.

    Yes, even with this limited set of functions it's already painful I think

     
  • Andre Natal

    Andre Natal - 2011-07-19

    Well, so... this way no new wrapper is needed...
    I was trying to port it and it is painfuil... handle alsa issues and so on...

     
  • Andre Natal

    Andre Natal - 2011-07-19

    So, just for double check:

    After load and configure the decoder, I need to call startUtt when have voice
    activity and endUtt when stops activity. To get the final results. No need to
    recreate the decoder after each silence period?

     
  • Nickolay V. Shmyrev

    I need to call startUtt when have voice activity and endUtt when stops
    activity.

    Yes

    No need to recreate the decoder after each silence period?

    You do not need to recreiate the decoder. You can use existing one just call
    start_utt again.

     
  • Andre Natal

    Andre Natal - 2011-07-25

    Nikolay, I got very near, but stuck an a crazy issue.

    Alerady detect voice activity, already capture the speech in PCM, 1 Channel,
    8khz, bitsPerSample 16, that is recorded to a local file and passed to decoder
    as short.

    I can see the decoder receives the data ok, because he generates it (you can
    get a sample here: http://dl.dropbox.com/u/6231836/000000000.raw , saying OPEN BROWSER)

    But, in log, I am getting a lot of "Final state not reached in frame".

    Looks a very little detail. Do you have any idea?

    Thanks

    Follows the log.

    INFO: cmd_ln.c(559): Parsing command line:
    \
    -nfilt 20 \
    -lowerf 1 \
    -upperf 4000 \
    -wlen 0.025 \
    -transform dct \
    -round_filters no \
    -remove_dc yes \
    -svspec 0-12/13-25/26-38 \
    -feat 1s_c_d_dd \
    -agc none \
    -cmn current \
    -cmninit 56,-3,1 \
    -varnorm no

    Current configuration:

    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 56,-3,1
    -dither no no
    -doublebw no no
    -feat 1s_c_d_dd 1s_c_d_dd
    -frate 100 100
    -input_endian little little
    -lda
    -ldadim 0 0
    -lifter 0 0
    -logspec no no
    -lowerf 133.33334 1.000000e+00
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 20
    -remove_dc no yes
    -round_filters yes no
    -samprate 16000 8.000000e+03
    -seed -1 -1
    -smoothspec no no
    -svspec 0-12/13-25/26-38
    -transform legacy dct
    -unit_area yes yes
    -upperf 6855.4976 4.000000e+03
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wlen 0.025625 2.500000e-02

    INFO: acmod.c(242): Parsed model-specific feature parameters from
    /sdcard/Android/data/pocketsphinx/hmm/en_US//feat.params
    INFO: feat.c(697): Initializing feature stream to type: '1s_c_d_dd',
    ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean= 12.00, mean= 0.0
    INFO: acmod.c(163): Using subvector specification 0-12/13-25/26-38
    INFO: mdef.c(520): Reading model definition:
    /sdcard/Android/data/pocketsphinx/hmm/en_US//mdef
    INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef
    file
    INFO: bin_mdef.c(330): Reading binary model definition:
    /sdcard/Android/data/pocketsphinx/hmm/en_US//mdef
    INFO: bin_mdef.c(507): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150
    CI-sen, 5150 Sen, 27135 Sen-Seq
    INFO: tmat.c(205): Reading HMM transition probability matrices:
    /sdcard/Android/data/pocketsphinx/hmm/en_US//transition_matrices
    INFO: acmod.c(117): Attempting to use SCHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /sdcard/Android/data/pocketsphinx/hmm/en_US//means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /sdcard/Android/data/pocketsphinx/hmm/en_US//variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: s2_semi_mgau.c(908): Loading senones from dump file
    /sdcard/Android/data/pocketsphinx/hmm/en_US//sendump
    INFO: s2_semi_mgau.c(932): BEGIN FILE FORMAT DESCRIPTION
    INFO: s2_semi_mgau.c(1027): Using memory-mapped I/O for senones
    INFO: s2_semi_mgau.c(1304): Maximum top-N: 4 Top-N beams: 0 0 0
    INFO: phone_loop_search.c(105): State beam -230231 Phone exit beam -115115
    Insertion penalty 0
    INFO: dict.c(306): Allocating 4114 * 20 bytes (80 KiB) for word entries
    INFO: dict.c(321): Reading main dictionary:
    /sdcard/Android/data/pocketsphinx/lm/en_US/dic.dic
    INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(324): 7 words read
    INFO: dict.c(330): Reading filler dictionary:
    /sdcard/Android/data/pocketsphinx/hmm/en_US//noisedict
    INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(333): 11 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(404): Allocating 50^3 * 2 bytes (244 KiB) for word-initial
    triphones
    INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
    INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word
    triphones
    INFO: fsg_search.c(145): FSG(beam: -1080, pbeam: -1080, wbeam: -634; wip: -26,
    pip: 0)
    INFO: jsgf.c(546): Defined rule: PUBLIC <grm.simple>
    INFO: fsg_model.c(213): Computing transitive closure for null transitions
    INFO: fsg_model.c(264): 0 null transitions added
    INFO: fsg_model.c(411): Adding silence transitions for <sil> to FSG
    INFO: fsg_model.c(431): Added 8 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++NOISE++ to FSG
    INFO: fsg_model.c(431): Added 8 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++BREATH++ to FSG
    INFO: fsg_model.c(431): Added 8 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++SMACK++ to FSG
    INFO: fsg_model.c(431): Added 8 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++COUGH++ to FSG
    INFO: fsg_model.c(431): Added 8 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++LAUGH++ to FSG
    INFO: fsg_model.c(431): Added 8 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++TONE++ to FSG
    INFO: fsg_model.c(431): Added 8 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++UH++ to FSG
    INFO: fsg_model.c(431): Added 8 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++UM++ to FSG
    INFO: fsg_model.c(431): Added 8 silence word transitions
    INFO: fsg_search.c(364): Added 1 alternate word transitions
    INFO: fsg_lextree.c(108): Allocated 816 bytes (0 KiB) for left and right
    context phones
    INFO: fsg_lextree.c(251): 102 HMM nodes in lextree (83 leaves)
    INFO: fsg_lextree.c(253): Allocated 11016 bytes (10 KiB) for all lextree nodes
    INFO: fsg_lextree.c(256): Allocated 8964 bytes (8 KiB) for lextree leafnodes
    INFO: pocketsphinx.c(673): Writing raw audio log file:
    /sdcard/Android/data/pocketsphinx/000000000.raw
    INFO: cmn_prior.c(121): cmn_prior_update: from < 56.00 -3.00 1.00 0.00 0.00
    0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
    INFO: cmn_prior.c(139): cmn_prior_update: to < 90.98 -7.73 -2.32 -1.44 -0.87
    -0.60 -0.35 -0.24 -0.30 -0.12 -0.09 -0.12 -0.05 >
    INFO: fsg_search.c(1030): 255 frames, 5212 HMMs (20/fr), 15652 senones
    (61/fr), 691 history entries (2/fr) </sil></grm.simple>

    ERROR: "fsg_search.c", line 1099: Final state not reached in frame 255
    INFO: pocketsphinx.c(846): 000000000: (null) (1144249008)
    INFO: word start end pprob ascr lscr lback
    ERROR: "fsg_search.c", line 1099: Final state not reached in frame 255
    INFO: pocketsphinx.c(673): Writing raw audio log file:
    /sdcard/Android/data/pocketsphinx/000000001.raw
    INFO: cmn_prior.c(121): cmn_prior_update: from < 90.98 -7.73 -2.32 -1.44 -0.87
    -0.60 -0.35 -0.24 -0.30 -0.12 -0.09 -0.12 -0.05 >
    INFO: cmn_prior.c(139): cmn_prior_update: to < 90.92 -7.97 -2.23 -1.37 -0.80
    -0.59 -0.38 -0.28 -0.25 -0.14 -0.09 -0.14 -0.07 >
    INFO: fsg_search.c(1030): 255 frames, 3479 HMMs (13/fr), 12305 senones
    (48/fr), 254 history entries (0/fr)

    ERROR: "fsg_search.c", line 1099: Final state not reached in frame 255
    INFO: pocketsphinx.c(846): 000000001: (null) (2969816)
    INFO: word start end pprob ascr lscr lback
    ERROR: "fsg_search.c", line 1099: Final state not reached in frame 255
    INFO: pocketsphinx.c(673): Writing raw audio log file:
    /sdcard/Android/data/pocketsphinx/000000002.raw
    INFO: cmn_prior.c(121): cmn_prior_update: from < 90.92 -7.97 -2.23 -1.37 -0.80
    -0.59 -0.38 -0.28 -0.25 -0.14 -0.09 -0.14 -0.07 >
    INFO: cmn_prior.c(139): cmn_prior_update: to < 90.90 -7.99 -2.23 -1.37 -0.81
    -0.59 -0.38 -0.29 -0.23 -0.13 -0.10 -0.13 -0.08 >
    INFO: fsg_search.c(1030): 153 frames, 2090 HMMs (13/fr), 7390 senones (48/fr),
    152 history entries (0/fr)

    ERROR: "fsg_search.c", line 1099: Final state not reached in frame 153
    INFO: pocketsphinx.c(846): 000000002: (null) (1164416)
    INFO: word start end pprob ascr lscr lback
    ERROR: "fsg_search.c", line 1099: Final state not reached in frame 153
    INFO: cmd_ln.c(559): Parsing command line:
    \
    -nfilt 20 \
    -lowerf 1 \
    -upperf 4000 \
    -wlen 0.025 \
    -transform dct \
    -round_filters no \
    -remove_dc yes \
    -svspec 0-12/13-25/26-38 \
    -feat 1s_c_d_dd \
    -agc none \
    -cmn current \
    -cmninit 56,-3,1 \
    -varnorm no

    Current configuration:

    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 56,-3,1
    -dither no no
    -doublebw no no
    -feat 1s_c_d_dd 1s_c_d_dd
    -frate 100 100
    -input_endian little little
    -lda
    -ldadim 0 0
    -lifter 0 0
    -logspec no no
    -lowerf 133.33334 1.000000e+00
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 20
    -remove_dc no yes
    -round_filters yes no
    -samprate 16000 8.000000e+03
    -seed -1 -1
    -smoothspec no no
    -svspec 0-12/13-25/26-38
    -transform legacy dct
    -unit_area yes yes
    -upperf 6855.4976 4.000000e+03
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wlen 0.025625 2.500000e-02

    INFO: acmod.c(242): Parsed model-specific feature parameters from
    /sdcard/Android/data/pocketsphinx/hmm/en_US//feat.params
    INFO: feat.c(697): Initializing feature stream to type: '1s_c_d_dd',
    ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean= 12.00, mean= 0.0
    INFO: acmod.c(163): Using subvector specification 0-12/13-25/26-38
    INFO: mdef.c(520): Reading model definition:
    /sdcard/Android/data/pocketsphinx/hmm/en_US//mdef
    INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef
    file
    INFO: bin_mdef.c(330): Reading binary model definition:
    /sdcard/Android/data/pocketsphinx/hmm/en_US//mdef
    INFO: bin_mdef.c(507): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150
    CI-sen, 5150 Sen, 27135 Sen-Seq
    INFO: tmat.c(205): Reading HMM transition probability matrices:
    /sdcard/Android/data/pocketsphinx/hmm/en_US//transition_matrices
    INFO: acmod.c(117): Attempting to use SCHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /sdcard/Android/data/pocketsphinx/hmm/en_US//means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /sdcard/Android/data/pocketsphinx/hmm/en_US//variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size:
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(294): 256x13
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: s2_semi_mgau.c(908): Loading senones from dump file
    /sdcard/Android/data/pocketsphinx/hmm/en_US//sendump
    INFO: s2_semi_mgau.c(932): BEGIN FILE FORMAT DESCRIPTION
    INFO: s2_semi_mgau.c(1027): Using memory-mapped I/O for senones
    INFO: s2_semi_mgau.c(1304): Maximum top-N: 4 Top-N beams: 0 0 0
    INFO: phone_loop_search.c(105): State beam -230231 Phone exit beam -115115
    Insertion penalty 0
    INFO: dict.c(306): Allocating 4114 * 20 bytes (80 KiB) for word entries
    INFO: dict.c(321): Reading main dictionary:
    /sdcard/Android/data/pocketsphinx/lm/en_US/dic.dic
    INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(324): 7 words read
    INFO: dict.c(330): Reading filler dictionary:
    /sdcard/Android/data/pocketsphinx/hmm/en_US//noisedict
    INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(333): 11 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(404): Allocating 50^3 * 2 bytes (244 KiB) for word-initial
    triphones
    INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
    INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word
    triphones
    INFO: fsg_search.c(145): FSG(beam: -1080, pbeam: -1080, wbeam: -634; wip: -26,
    pip: 0)
    INFO: jsgf.c(546): Defined rule: PUBLIC <grm.simple>
    INFO: fsg_model.c(213): Computing transitive closure for null transitions
    INFO: fsg_model.c(264): 0 null transitions added
    INFO: fsg_model.c(411): Adding silence transitions for <sil> to FSG
    INFO: fsg_model.c(431): Added 8 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++NOISE++ to FSG
    INFO: fsg_model.c(431): Added 8 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++BREATH++ to FSG
    INFO: fsg_model.c(431): Added 8 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++SMACK++ to FSG
    INFO: fsg_model.c(431): Added 8 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++COUGH++ to FSG
    INFO: fsg_model.c(431): Added 8 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++LAUGH++ to FSG
    INFO: fsg_model.c(431): Added 8 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++TONE++ to FSG
    INFO: fsg_model.c(431): Added 8 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++UH++ to FSG
    INFO: fsg_model.c(431): Added 8 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++UM++ to FSG
    INFO: fsg_model.c(431): Added 8 silence word transitions
    INFO: fsg_search.c(364): Added 1 alternate word transitions
    INFO: fsg_lextree.c(108): Allocated 816 bytes (0 KiB) for left and right
    context phones
    INFO: fsg_lextree.c(251): 102 HMM nodes in lextree (83 leaves)
    INFO: fsg_lextree.c(253): Allocated 11016 bytes (10 KiB) for all lextree nodes
    INFO: fsg_lextree.c(256): Allocated 8964 bytes (8 KiB) for lextree leafnodes
    INFO: pocketsphinx.c(673): Writing raw audio log file:
    /sdcard/Android/data/pocketsphinx/000000000.raw
    INFO: cmn_prior.c(121): cmn_prior_update: from < 56.00 -3.00 1.00 0.00 0.00
    0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
    INFO: cmn_prior.c(139): cmn_prior_update: to < 91.02 -7.05 -2.33 -1.44 -0.76
    -0.54 -0.39 -0.21 -0.19 -0.14 -0.19 -0.15 -0.13 >
    INFO: fsg_search.c(1030): 255 frames, 4972 HMMs (19/fr), 15047 senones
    (59/fr), 641 history entries (2/fr) </sil></grm.simple>

    ERROR: "fsg_search.c", line 1099: Final state not reached in frame 255
    INFO: pocketsphinx.c(846): 000000000: (null) (1144249008)
    INFO: word start end pprob ascr lscr lback
    ERROR: "fsg_search.c", line 1099: Final state not reached in frame 255

     
  • Andre Natal

    Andre Natal - 2011-07-27

    Ok, made it work without need to generate a file. Now i am passing directly
    the data from onBufferReceived(byte buf) to process_raw after convert to a
    short. At least now I get some partialResults with wrong values but never got
    a endResults.

    So, I am wondering two things:

    1 - I am obligated to pass 512 shorts to process_raw and this is misbehaving
    the decoder. onBufferReceived raises an array of bytes with 320 positions and
    i am passing directly. No much sense for me.

    2 - Maybe the format of audio fed by onBufferReceived, is not what
    pocketsphinx expects.

    Accordingly with Google:
    android.speech.RecognitionListener.onBufferReceived(byte buffer)
    buffer: a buffer containing a sequence of big-endian 16-bit integers
    representing a single channel audio stream. The sample rate is implementation
    dependent.

    If i save this data and put a wave header on it this way:
    WaveHeader hdr = new WaveHeader(WaveHeader.FORMAT_PCM,(short)1, 8000,
    (short)16, pcm.length);
    The audio plays perfectly.

    Makes sense?

    Thanks

     
  • Amin Yazdani

    Amin Yazdani - 2011-07-27

    Hi

    And if I record only the voice between silence and use pocketsphinx to
    recognize from a recorded file? Is this possible?

    I'm trying to do the same, but I don't know how to find the silence in the
    audio. Can you please help me with that?
    What did you use for finding the silences? Did you used
    android.Media.AudioRecord?

    Any tips would be helpful for me.

    Thanks,

     
  • Nickolay V. Shmyrev

    I am obligated to pass 512 shorts to process_raw and this is misbehaving the
    decoder.

    I'm not sure why is this. 320 shorts is one frame and it looks natural size.
    Why do you work with 512 shorts?

    sequence of big-endian 16-bit integers

    Pocketsphinx expects little endian.

     
  • Andre Natal

    Andre Natal - 2011-07-27

    Ok, i am acumulating it to 512 as i saw in some .c version consuming example.

    onBufferReceived raises 320 bytes, that converted to shorts get 160.So, if i
    pass 320 shorts is fine, i will need to acumulate it only twice before pass to
    process_Raw

    About the little endian issue, i think this is a tremendous problem to make
    this work.

    Hope this is changeable on android source code or it is a brick on top.

     
  • Nickolay V. Shmyrev

    You don't need to accumulate, you can pass 160 too

     
  • Andre Natal

    Andre Natal - 2011-07-28

    Thank you. Do you have any tip on the little endian issue?
    I will start dig on androide code....

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.