Hi,
I'm a student hoping to do a speech-related project next year using Java, and so after searching the net I found Sphinx4, which seems great.
The problem I'm having with it though is pretty serious; I've run the demos, but on all the basic ones (HelloDigits, HelloWorld and Dialog for example) the program will only work once.
The program starts, and then asks for input (like a number on HelloDigits). I speak some input and it is recognised fine. However, I am then asked to speak again, at which point the program stops responding. The program is still running, and no exceptions are thrown; it simply seems to enter an infinate loop.
This problem occurs using both the pre-compiled and the self-compiled source versions.
After some digging through the code files, I found that the reason for the loop was in the SpeechMarker class. There is a loop within that which makes the program discard any input until a 'Start' signal of some sort has been recieved, and then keeps on collecting the data until an 'End' signal is recieved, which is when it goes back to discarding.
For the first time speaking this works fine, but after some investigation it seems that on the second time round the start signal is never sent; the speech data is just sent straight away. This means that, although the program is recieving my input (my speech), it is simply discarding it.
I tried this on about 4 differnt computers, with different OS's (Windows98, 2000 and XP), and on JSE version 5 and 6, all with the same result.
I recently downloaded the nightly build to try, hoping it was a problem that had been fixed. I discovered, however, that it is even worse; the program loops from the start, not even recognising my speech once. Without spending lots of time going through the code (time that i don't really have), all I can do is ask for help and hope someone knows whats going on.
The recogniser works with files (the transcriber demo works fine) and so for now i can build up the grammer and things I need and test it using a recording. This will have to be an interim solution though, as the project I'm using this for is required to work in real-time.
Thanks in advance for any help offered.
Mike Brown
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> the program loops from the start,
> not even recognizing my speech once
Well observed :-) ... and already fixed now. (just checkout once more please)
@s4devs: I think that the live-regression-tests should work better now. <hehe>
Sorry for inconvenience, cheers,
Holger
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
tidigits_wordlist_live_quick
Status OK
Accuracy (WER %) IMPROVED - decreased from 15.3419 to 15.2898
Time (X Realtime) OK - 0.139625
Average Heap (MB) IMPROVED - decreased from 202.76 to 23.72
Gap Insertion (%) IMPROVED - decreased from 29.6946 to 5.67511
Utterance Ratio IMPROVED - decreased from 0.000574713 to 1.00057
Avg Response Time (s) OK - 0.0029126436
an4_words_bigram_live
Status OK
Accuracy (WER %) REGRESSED - increased from 56.8797 to 58.4507
Time (X Realtime) IMPROVED - decreased from 1.2 to 1.06381
Average Heap (MB) IMPROVED - decreased from 166.71 to 140.26
Gap Insertion (%) IMPROVED - decreased from 24.7021 to 0.812568
Utterance Ratio IMPROVED - decreased from 0.00214133 to 1.01285
Avg Response Time (s) OK - 0.0030995763
Cheers,
Holger
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi
Thanks for the help, but still not working...:-(
Firstly, Stefanie, I tried recording some audio and it looks and sounds fine, I think i've got the levels set OK. I havn't had much time recently so I'l try changing the threshold either tonight or tommorow.
Secondly, Holger, thanks for pointing out that the problem was fixed. However, when i run the demos now they still enter an infinate loop, but seem to do it at an earlier stage (a simple println statement revealed that it didn't even make it to the SpeechMarker class where it failed before).
In the next few days I'll try both the tuning and I'l attempt to uncover the point where it is now failing.
Also, thanks for the quick responses :-)
Mike
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry for the delay, I've been away for a bit and then got stuck with a load of work...anyway, the good news is that I've got it working now!
I did what you said and plugged the vu-meter-monitor in and it showed that the signal was getting in (it is a useful tool...), so then I started tweaking with the threshold property on the speech classifier.
It seems that the default that it was set at (13 on the HelloDigits demo) was way too high, as I had to set it down to 1 or 2 to get it working. Setting it at 1 is a bit too low, as its classifying a bit of background noise aswell now, but i can tweak it better later, I'm just happy it's working now :)
So in the end, it was quite a simple problem, but I'd have been lost without your's and Stefanie's help, so a big thank you!
Mike
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm a student hoping to do a speech-related project next year using Java, and so after searching the net I found Sphinx4, which seems great.
The problem I'm having with it though is pretty serious; I've run the demos, but on all the basic ones (HelloDigits, HelloWorld and Dialog for example) the program will only work once.
The program starts, and then asks for input (like a number on HelloDigits). I speak some input and it is recognised fine. However, I am then asked to speak again, at which point the program stops responding. The program is still running, and no exceptions are thrown; it simply seems to enter an infinate loop.
This problem occurs using both the pre-compiled and the self-compiled source versions.
After some digging through the code files, I found that the reason for the loop was in the SpeechMarker class. There is a loop within that which makes the program discard any input until a 'Start' signal of some sort has been recieved, and then keeps on collecting the data until an 'End' signal is recieved, which is when it goes back to discarding.
For the first time speaking this works fine, but after some investigation it seems that on the second time round the start signal is never sent; the speech data is just sent straight away. This means that, although the program is recieving my input (my speech), it is simply discarding it.
I tried this on about 4 differnt computers, with different OS's (Windows98, 2000 and XP), and on JSE version 5 and 6, all with the same result.
I recently downloaded the nightly build to try, hoping it was a problem that had been fixed. I discovered, however, that it is even worse; the program loops from the start, not even recognising my speech once. Without spending lots of time going through the code (time that i don't really have), all I can do is ask for help and hope someone knows whats going on.
The recogniser works with files (the transcriber demo works fine) and so for now i can build up the grammer and things I need and test it using a recording. This will have to be an interim solution though, as the project I'm using this for is required to work in real-time.
Thanks in advance for any help offered.
Mike Brown
Try tuning the threshold property in speech classifier. If it marks everything as speech, sphinx quits after the first result.
Also check your mic levels and stuff - record audio in something and play it back and make sure it sounds okay.
Stefanie
Hi Mike,
> the program loops from the start,
> not even recognizing my speech once
Well observed :-) ... and already fixed now. (just checkout once more please)
@s4devs: I think that the live-regression-tests should work better now. <hehe>
Sorry for inconvenience, cheers,
Holger
et voila:
tidigits_wordlist_live_quick
Status OK
Accuracy (WER %) IMPROVED - decreased from 15.3419 to 15.2898
Time (X Realtime) OK - 0.139625
Average Heap (MB) IMPROVED - decreased from 202.76 to 23.72
Gap Insertion (%) IMPROVED - decreased from 29.6946 to 5.67511
Utterance Ratio IMPROVED - decreased from 0.000574713 to 1.00057
Avg Response Time (s) OK - 0.0029126436
an4_words_bigram_live
Status OK
Accuracy (WER %) REGRESSED - increased from 56.8797 to 58.4507
Time (X Realtime) IMPROVED - decreased from 1.2 to 1.06381
Average Heap (MB) IMPROVED - decreased from 166.71 to 140.26
Gap Insertion (%) IMPROVED - decreased from 24.7021 to 0.812568
Utterance Ratio IMPROVED - decreased from 0.00214133 to 1.01285
Avg Response Time (s) OK - 0.0030995763
Cheers,
Holger
Hi
Thanks for the help, but still not working...:-(
Firstly, Stefanie, I tried recording some audio and it looks and sounds fine, I think i've got the levels set OK. I havn't had much time recently so I'l try changing the threshold either tonight or tommorow.
Secondly, Holger, thanks for pointing out that the problem was fixed. However, when i run the demos now they still enter an infinate loop, but seem to do it at an earlier stage (a simple println statement revealed that it didn't even make it to the SpeechMarker class where it failed before).
In the next few days I'll try both the tuning and I'l attempt to uncover the point where it is now failing.
Also, thanks for the quick responses :-)
Mike
Hi Mike,
Does the speech signal makes it into s4? You can test this, by plugging the vu-meter-dataprocessor before the speech-marker.
Which of the demos did you try? I've tried all demos and they work as expected.
-Holger
Hi Holger,
Sorry for the delay, I've been away for a bit and then got stuck with a load of work...anyway, the good news is that I've got it working now!
I did what you said and plugged the vu-meter-monitor in and it showed that the signal was getting in (it is a useful tool...), so then I started tweaking with the threshold property on the speech classifier.
It seems that the default that it was set at (13 on the HelloDigits demo) was way too high, as I had to set it down to 1 or 2 to get it working. Setting it at 1 is a bit too low, as its classifying a bit of background noise aswell now, but i can tweak it better later, I'm just happy it's working now :)
So in the end, it was quite a simple problem, but I'd have been lost without your's and Stefanie's help, so a big thank you!
Mike