You asked me; "I hope you can help us to make us a demo like android one."
Of course, thats the least I could do after you helpt me with getting things to work.
But I have some questions, because I can't run the Androud version of Sphinx myself.
What does the demo do? (functionality/scenario)
What Sphinx source do you want me to use in the demo? (I only used the Trunk, but I don't know if there are any kind of stable versions?)
Hope to hear from you soI can start writing, or start continue researching demo functionalities.
Toine de Boer
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
On start it displays "say oh mighty computer to activate" and listens for keyword "oh mighty computer". Once keyword occurs, it switches to grammar mode and recognizes digits from 0 to 10 and displays them on the screen. Then switches back to keyword search mode.
What Sphinx source do you want me to use in the demo? (I only used the Trunk, but I don't know if there are any kind of stable versions?)
Trunk of course.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Tnx, but this is still gone take a while because I need to discover each Method seperately.
I'm not used to this kind of API documentation http://cmusphinx.sourceforge.net/doc/pocketsphinx/
For example, what means a return value '1' from ps_set_kws?
And are there any limits to the bytes you send in to ps_process_raw? do they need to be always the same etc....
I was looking at the Android example, but the source where this happens is not accessible.
Can you help some more with this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
PS: I already used the following code that SOMETIMES producet an error log, but not always.
const wchar_t wLogPath = Windows::Storage::ApplicationData::Current->LocalFolder->Path->Data();
wcstombs(cpath, wLogPath, 1024);
char logPath = concat(cpath, "\err.log");
err_set_logfile(logPath);
Hope to hear from you
And thanks again for your support
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I wait 10 seconds before closing the app or debug session, still the error log isn't full like I'm expecting.
I'm expecting to find something from "ps_set_kws(ps, Cname, Ckeyphrase);"
(I found the error itself but for development purposes I realy need a log)
Do you have any ideas?
PS: I put fflush(stderr) at the end of err_msg_system() and also as seperate method to raise directly from my code.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm trying to process the bytes that are recorded from the microphone, and was hoping you could help me setup some simple realtime detection.
int SpeechRecognizer::RegisterAudioBytes(const Platform::Array<uint8>^ audioBytes)
{
// source: http://blog.csdn.net/zouxy09/article/details/7978108
int16 audioBuffer[4096];
int32 k, ts, rem;
char const hyp;
char const uttid;
char word[256];
// Length for Int16[]
k = audioBytes->Length / 2;
// Convert ByteArray Array<uint8> to Int16[]
for (size_t i = 0; i < audioBytes->Length; i += 2)
{
audioBuffer[i / 2] = audioBytes[i] + ((int16)audioBytes[i + 1] << 8);
}
// Proccess bytes
int result = ps_process_raw(ps, audioBuffer, k, TRUE, FALSE);
return result;
}
I began with the above, to procces a Byte array coming from a C# WP project. I'm not 100% sure that the conversion to In16{} is OK and if I'm using the ps_processRaw the write way.....
I have looked to the True and False in ps_process_raw, but am not sure what and how to use it....
Can you help me? (PS: the output of ps_process_raw always is 0, and the input of the method is a random filled array of 1280 and 960 bytes)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I began with the above, to procces a Byte array coming from a C# WP project. I'm not 100% sure that the conversion to In16{} is OK and if I'm using the ps_processRaw the write way.....
Looks mostly ok
int result = ps_process_raw(ps, audioBuffer, k, TRUE, FALSE);
It should be FALSE, FALSE
Can you help me? (PS: the output of ps_process_raw always is 0, and the input of the method is a random filled array of 1280 and 960 bytes)
This is because of TRUE (no_search argument in ps_process_raw). It must be FALSE, then ps will return the number of frames processed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
But now the big question; How do I detect words/phrases?
1: Do I need to raise a Method after each ps_process_raw, or do I need to raise it once every X bytes, or....
2: What will the result be? Just a word or sentence, collection of possible outcomes, collection of all outcomes with percentages ??..
PS: I supose there isn't an easy event to hookup on :-)
A event is realy what I want in the end, and I think I need to make myself.
(I already placed an event in my code to raise, now only the mechanisme)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1: Do I need to raise a Method after each ps_process_raw, or do I need to raise it once every X bytes, or....
I wrote you pseudocode above. After each ps_process_raw you call ps_get_hyp and if the result matches keyword you can proceed with further steps otherwise you process next chunk of raw data.
2: What will the result be? Just a word or sentence, collection of possible outcomes, collection of all outcomes with percentages ??..
In keyword spotting mode the result is a string containing keyword.
A event is realy what I want in the end, and I think I need to make myself.
Yes, that's up to you to design which events will the component emit.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
But more and more question come up how the system/modes work and where I can find more info about how the different modes work.
For example;
When searching for digits each recignized digit gets added and added and added to the result of ps_get_hyp(). Is there a way to reset that?
and/or
(also for ps_get_hyp) And I see a Score as a result to? or ID? What can I do with that? is there a hidden list with possible outcomes maybe?
and/or
Maybe most important for the turtorial; I Start with recognizing woith Digits because I won't get any result when I say "Oh Mighty Computer". Could be my dutch dialect, or that I don't set the search the good way.... but nothing at ps_get_hyp
When searching for digits each recignized digit gets added and added and added to the result of ps_get_hyp(). Is there a way to reset that?
You can stop search (ps_end_utt) and start it again (ps_start_utt) when silence occurs (ps_is_speech) becomes false. I wrote you the pseudocode above.
(also for ps_get_hyp) And I see a Score as a result to? or ID? What can I do with that? is there a hidden list with possible outcomes maybe?
Score and outid are artificats, they are not really useful. You can ignore them.
Maybe most important for the turtorial; I Start with recognizing woith Digits because I won't get any result when I say "Oh Mighty Computer". Could be my dutch dialect, or that I don't set the search the good way.... but nothing at ps_get_hyp
You need to set keyword spotting threshold in config on initialization "-kws_threshold 1e-40".
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
can detect "oh mighty computer" and the digits :-)
To completer your pseudocode, what do you mean with ps_set_search("grammar"); ???
What is then in reference to the Android example, where I took the models from.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nickolay,
You asked me; "I hope you can help us to make us a demo like android one."
Of course, thats the least I could do after you helpt me with getting things to work.
But I have some questions, because I can't run the Androud version of Sphinx myself.
Hope to hear from you soI can start writing, or start continue researching demo functionalities.
Toine de Boer
On start it displays "say oh mighty computer to activate" and listens for keyword "oh mighty computer". Once keyword occurs, it switches to grammar mode and recognizes digits from 0 to 10 and displays them on the screen. Then switches back to keyword search mode.
Trunk of course.
OK, tnx for the scenario.
I will start with getting just the basics to work, the "oh mighty computer" part.
Then switching grammer and search mode, but I don't have a clue what that is...
So I will be back, if you have any tips please don't hesitate....
Already my first question:
How to feed a continues pool of incomming microphone data?
I'm looking at but think that isn't the way to do that : http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx#decoding_audio_data_from_memory
Can you give me some direction?
I'm recording sounds:
-16 bits
-1 Channel
-Samplerate 16000
PS: incase you want to look at the code I'm testing with ...
https://onedrive.live.com/redir?resid=53DF68CA92747BA6%2132783 (27-7: added Native redorder)
Last edit: Toine db 2014-07-27
Can you (or someone else) give me hint what Methods I need to use in PocketSphinx to get your functionailties to work?
I have constant stream of byte array from a 16bits 1channel 16k source.
Tnx, but this is still gone take a while because I need to discover each Method seperately.
I'm not used to this kind of API documentation http://cmusphinx.sourceforge.net/doc/pocketsphinx/
For example, what means a return value '1' from ps_set_kws?
And are there any limits to the bytes you send in to ps_process_raw? do they need to be always the same etc....
I was looking at the Android example, but the source where this happens is not accessible.
Can you help some more with this?
Hello Nickolay,
Can you help me with the following questions I have?
(or have a source code as example, the Android code is closed when it gets intresting)
Hope to hear from you,
and example source code would be fantastic if possible
Error occured. You can see details in the log. To store the log in filesystem you can add -logfn to decoder configuration
No
No
Tnx for the feedback.
But when I add -logfn (like below) I get the error "cannot redirect log output".
Am I missing some parameters?
PS: I already used the following code that SOMETIMES producet an error log, but not always.
const wchar_t wLogPath = Windows::Storage::ApplicationData::Current->LocalFolder->Path->Data();
wcstombs(cpath, wLogPath, 1024);
char logPath = concat(cpath, "\err.log");
err_set_logfile(logPath);
Hope to hear from you
And thanks again for your support
There should be filename here
This an alternative way. To make sure log remains after application exit, add fflush(stderr) call to the function in err.c which prints the message.
Tnx, I managed to get always a log but still there are problems.
The log just stops at 7/8 kb.... https://onedrive.live.com/redir?resid=53DF68CA92747BA6%2139023
And the fflush(stderr) doesn't seem to do anything.
I wait 10 seconds before closing the app or debug session, still the error log isn't full like I'm expecting.
I'm expecting to find something from "ps_set_kws(ps, Cname, Ckeyphrase);"
(I found the error itself but for development purposes I realy need a log)
Do you have any ideas?
PS: I put fflush(stderr) at the end of err_msg_system() and also as seperate method to raise directly from my code.
Not just err_msg_system but also err_msg. Or add fflush(fp) in err_logfp_cb.
Tnx, that works now.
Some following questions about loading stuff:
To load Grammar... is:
the same as
?
And is
the way to load Language models like weather.dmp from the Android demo?
PS: I'm trying to make the Android demo on Windows Phone
yes
yes
Tnx Nickolay,
The Solution for the Windows Phone turtorial is making progress.
Loading models, phrases etc and setting search type is working.
Following (and last) is handling/processing incomming voice data
I'll keep you informed
Cool, I'd be glad to try it. Let me know if you need some help.
He Nickolay,
I'm trying to process the bytes that are recorded from the microphone, and was hoping you could help me setup some simple realtime detection.
I began with the above, to procces a Byte array coming from a C# WP project. I'm not 100% sure that the conversion to In16{} is OK and if I'm using the ps_processRaw the write way.....
I have looked to the True and False in ps_process_raw, but am not sure what and how to use it....
Can you help me? (PS: the output of ps_process_raw always is 0, and the input of the method is a random filled array of 1280 and 960 bytes)
Looks mostly ok
It should be FALSE, FALSE
This is because of TRUE (no_search argument in ps_process_raw). It must be FALSE, then ps will return the number of frames processed.
OK, good to know.
But now the big question; How do I detect words/phrases?
1: Do I need to raise a Method after each ps_process_raw, or do I need to raise it once every X bytes, or....
2: What will the result be? Just a word or sentence, collection of possible outcomes, collection of all outcomes with percentages ??..
PS: I supose there isn't an easy event to hookup on :-)
A event is realy what I want in the end, and I think I need to make myself.
(I already placed an event in my code to raise, now only the mechanisme)
I wrote you pseudocode above. After each ps_process_raw you call ps_get_hyp and if the result matches keyword you can proceed with further steps otherwise you process next chunk of raw data.
In keyword spotting mode the result is a string containing keyword.
Yes, that's up to you to design which events will the component emit.
Tnx for the response, friday I will continue on this.
But before that, is there a location with some overview of the different Modes?
Names, results etc....
Last edit: Toine db 2014-08-29
Hi Nickolay,
First recognition finaly works! "The digits"
But more and more question come up how the system/modes work and where I can find more info about how the different modes work.
For example;
and/or
and/or
For the last 'oh mighty computer' problem you can see my project at:
https://onedrive.live.com/redir?resid=53DF68CA92747BA6%2142806
Hope you can help me again.
PS: It was really great to see the digit thing already work!
Last edit: Toine db 2014-08-29
Great, congratulations
You can stop search (ps_end_utt) and start it again (ps_start_utt) when silence occurs (ps_is_speech) becomes false. I wrote you the pseudocode above.
Score and outid are artificats, they are not really useful. You can ignore them.
You need to set keyword spotting threshold in config on initialization "-kws_threshold 1e-40".
Tnx, all you tips worked great.
can detect "oh mighty computer" and the digits :-)
To completer your pseudocode, what do you mean with ps_set_search("grammar"); ???
What is then in reference to the Android example, where I took the models from.