CMU Sphinx / Forums / Help: Continuous: nonstop talking OK, silence not

Nickolay V. Shmyrev - 2010-04-06

So is there any issue now or everythign is ok?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Halle - 2010-04-06

No, it still never gets to "Listening", sorry I didn't mention that. This is
what is going on after the calibration finishes: it does about 250 loops of
calibration in which each has the following values throughout ad_read:

In ad_read, length before read (max * 2) is 512
In ad_read, length after read (actual bytes read) 512
In ad_read, length on return (half bytes read) is 256

Which looks reasonable. Then, it prints "READY..." and if I am talking at the
time that "READY..." prints, I get "Listening..." (which never returns a hyp)
and the following values in and out of ad_read:

In ad_read, length before read (max * 2) is 8192
In ad_read, length after read (actual bytes read) 8192
In ad_read, length on return (half bytes read) is 4096

If I am not talking when "READY..." prints, I never get "Listening..." and
these are the kinds of values that are going in and out of ad_read (k is just
me monitoring what is being returned from cont_ad_read):

k is 0

2010-04-06 13:28:38.617 Continuous In cont_ad_read_internal, about to do the
first ad_read
In ad_read, length before read (max * 2) is 54472
In ad_read, length after read (actual bytes read) 54472
In ad_read, length on return (half bytes read) is 27236

2010-04-06 13:28:38.618 Continuous In cont_ad_read_internal, about to do the
second ad_read
In ad_read, length before read (max * 2) is 63488
In ad_read, length after read (actual bytes read) 63488
In ad_read, length on return (half bytes read) is 31744

k is 0

2010-04-06 13:28:38.720 Continuous In cont_ad_read_internal, about to do the
first ad_read
In ad_read, length before read (max * 2) is 67584
In ad_read, length after read (actual bytes read) 67584
In ad_read, length on return (half bytes read) is 33792

2010-04-06 13:28:38.721 Continuous In cont_ad_read_internal, about to do the
second ad_read
In ad_read, length before read (max * 2) is 50688
In ad_read, length after read (actual bytes read) 50688
In ad_read, length on return (half bytes read) is 25344

k is 0

2010-04-06 13:28:38.822 Continuous In cont_ad_read_internal, about to do the
first ad_read
In ad_read, length before read (max * 2) is 80384
In ad_read, length after read (actual bytes read) 80384
In ad_read, length on return (half bytes read) is 40192

2010-04-06 13:28:38.823 Continuous In cont_ad_read_internal, about to do the
second ad_read
In ad_read, length before read (max * 2) is 37888
In ad_read, length after read (actual bytes read) 37888
In ad_read, length on return (half bytes read) is 18944

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Halle - 2010-04-06

An interesting thing is that if I comment out cont_ad_calib() from my
utterance loop, the results are exactly the same.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Halle - 2010-04-06

OK, I have some improvement, and a little clue about where the issue might
lie. For simplicity's sake I've switched over to a function that reads packets
instead of bytes so I have a 1:1 relationship between what is going into
ad_read and what is supposed to go out (I think - correct me if I'm wrong
about that). As expected, this didn't change any results. I also noticed that
my ad_read had gotten too simple because it wasn't returning correct values
when there was an EOF outcome during reads, so I fixed that. This got things
working again back to the extent that I can always get "Stopped listening,
please wait..." followed by a result after the very first utterance as long as
I am talking while continuous starts up. This is what is in my ad_read now:

UInt32 length = max;
UInt32 numBytes;
OSStatus status = AudioFileReadPackets ( r->recorder->GetAudioFileID(),
false,
&numBytes,
NULL,
0,
&length,
buf) ;

if (status == -39 && r->recording==0) { // status -39 is EOF, in this case
while not recording which shouldn't be happening
return -1;

} else if (status != 0) { //status 0 is success, other possibilities are an
EOF, a parameter error or something else
if(status = -39 && r->recording==1) {
if(length < 0) return -1;// this isn't really happening
else return length;
} else if (status == -50){ //bad parameter, this isn't really happening
return -1;
} else { // an unknown error, this hasn't happened to date
printf("unknown error is %d", (int)status);
return -1;
}

} else {
if(length < 0) return -1; // this isn't really happening
else return length;
}
return 0;

The next thing I tried was changing the number of buffers my audio file uses.
I have been using between one and three buffers of a half second in duration
through most of this testing. I changed it to 16 just for the purpose of
testing. So now what happens is that if I'm speaking when continuous starts, I
can speak with silences of as much as a couple of seconds, and recognition is
pretty good. If I have silences of longer than that, it will get into the loop
where it can no longer recognize any speech. So, maybe silence in the middle
of the buffer file is OK, but silence at the beginning or end is causing
breakage.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Halle - 2010-04-06

OK, I do think this is about the construction of my driver, I'm going to work
on it some more and see where I get. Thanks Nickolay!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Halle - 2010-04-06

OK, all good now - it was the starting packet offset; it needs to keep moving
forward the amount that has been read until there's an utterance.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Halle - 2010-04-06

BTW, really appreciate the help getting my two other mistakes fixed Nickolay
-- I don't think I would have figured out what was wrong with the starting
packet if the other causes of weirdness hadn't been fixed first.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-04-06

Nice it's working now. Let's hope such code will land in sphinxbase trunk one
day.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Halle - 2010-04-07

Sure thing, once I've gotten my projects out and had some time to standardize
it a bit.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tom Raic - 2010-04-27

Hey do you think you guy's can send me the A/D implementation for CoreAudio?
Do I only have to re-write that one ad_read function? Trying to port this into
an iPhone library. It compiles, but obviously I don't have access to the audio
input devices.

Any help would be appreciated. Thanks.

Tom Raic
tom@whistlebox.com

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Continuous: nonstop talking OK, silence not

Speech Recognition Toolkit

Forums

Help

Continuous: nonstop talking OK, silence not document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Continuous: nonstop talking OK, silence not