I'm looking to take pocketsphinx still in batch mode (reading from a file), but instead of processing it all at once, I want to process it some chunk of data at time. I made a function to replace ps_decode_raw that streams data in. It looks like this:
static int medvedsDecode(ps_decoder_t ps, FILE rawfh,
char const *uttid, long maxsamps)
{
long total, pos, i;
ps_start_utt(ps, uttid);
/* Otherwise decode it in a stream. */
total = 0;
while (!feof(rawfh))
{
long long start = 0, end = 0;
long difference = 0;
int16 data[256];
size_t nread;
//ClockTime(CLOCK_REALTIME, NULL, &start);
nread = fread(data, sizeof(*data), sizeof(data)/sizeof(*data), rawfh);
ps_process_raw(ps, data, nread, FALSE, FALSE);
total += nread;
//ClockTime(CLOCK_REALTIME, NULL, &end);
//difference = (end - start) / 1000; //(in microsecs)
//usleep((10000 * (512 / 80)) - difference);
}
ps_end_utt(ps);
return total;
}
I'm not really worried about the sleeping portion yet - what is weird (and what I'm looking for help on) is that the first file the decoder processes has really bad accuracy (say 50% WER). Then everything after that works well. If I run the same file twice in a row, the first one will be bad, but the second time it is fine (0% WER). Here is an example from the align file:
*** close *** help *** eva *** go to sleep *** home page *** I need help *** next step page back page forward previous step wakeup (ALL-FINLEYH\AUDIO\ALL\TRACK-1)
GO close BACK help GO eva TO go to sleep GO home page BACK GO need help GO next step page back page forward previous step wakeup (ALL-TRACK-1)
Words: 20 Correct: 19 Errors: 8 Percent correct = 95.00% Error = 40.00% Accuracy = 60.00%
Insertions: 7 Deletions: 0 Substitutions: 1
close help eva go to sleep home page i need help next step page back page forward previous step wakeup (ALL-FINLEYH\AUDIO\ALL\TRACK-1)
close help eva go to sleep home page i need help next step page back page forward previous step wakeup (ALL-TRACK-1)
Words: 20 Correct: 20 Errors: 0 Percent correct = 100.00% Error = 0.00% Accuracy = 100.00%
Thoughts? Pointers? I've tried different numbers for the data size (256 in the code above), with varying results, but always that first file is pretty awful.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Also, I'm curious why it seems that full batch mode (i.e. process the whole file at once) seems to have significantly better accuracy than a streaming batch mode. Do the algorithms differ?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi...
I'm looking to take pocketsphinx still in batch mode (reading from a file), but instead of processing it all at once, I want to process it some chunk of data at time. I made a function to replace ps_decode_raw that streams data in. It looks like this:
static int medvedsDecode(ps_decoder_t ps, FILE rawfh,
char const *uttid, long maxsamps)
{
long total, pos, i;
}
I'm not really worried about the sleeping portion yet - what is weird (and what I'm looking for help on) is that the first file the decoder processes has really bad accuracy (say 50% WER). Then everything after that works well. If I run the same file twice in a row, the first one will be bad, but the second time it is fine (0% WER). Here is an example from the align file:
*** close *** help *** eva *** go to sleep *** home page *** I need help *** next step page back page forward previous step wakeup (ALL-FINLEYH\AUDIO\ALL\TRACK-1)
GO close BACK help GO eva TO go to sleep GO home page BACK GO need help GO next step page back page forward previous step wakeup (ALL-TRACK-1)
Words: 20 Correct: 19 Errors: 8 Percent correct = 95.00% Error = 40.00% Accuracy = 60.00%
Insertions: 7 Deletions: 0 Substitutions: 1
close help eva go to sleep home page i need help next step page back page forward previous step wakeup (ALL-FINLEYH\AUDIO\ALL\TRACK-1)
close help eva go to sleep home page i need help next step page back page forward previous step wakeup (ALL-TRACK-1)
Words: 20 Correct: 20 Errors: 0 Percent correct = 100.00% Error = 0.00% Accuracy = 100.00%
Thoughts? Pointers? I've tried different numbers for the data size (256 in the code above), with varying results, but always that first file is pretty awful.
Also, I'm curious why it seems that full batch mode (i.e. process the whole file at once) seems to have significantly better accuracy than a streaming batch mode. Do the algorithms differ?
If you are using wsj1, you need to set cminit to something around 50, default value is bad for the models with -transform dct.