Ok I have finished porting over the pocketsphinx to our arm based embedded platform.
My dictionary contains only one word (e.g. the hot word), and tried with few short wav files and get correct result. I then tried a long music file. I see it almost occupy a whole 1.5G arm core. I wonder if this is expected? Is there a way get better performance? since this is contant detection I can't afford to have this much of cpu usage.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You need to read our tutorial first to understand you need to modify the language model and keep the dictionary as is.
You also need to provide more details on what language model are you using exactly and also provide a pocketsphinx output to get help on decoding speed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
No I use kws option so no lm is needed, I only need to detect one keyword in my case.
I am thinking that I might need to 'adapt' the acoustic model, just for this specific keyword. This way the acoustic model will be smaller and that should improve the workload I believe (since you only need to detect only very few phones).
From this link looks like we are trying to do the same thing: https://sourceforge.net/p/cmusphinx/discussion/speech-recognition/thread/1cb5594e/
However in the post you only mentioned that "Yes, you can remove senones and triphones which you will never see with a custom tool" I would like to know what is the custom tool? The acoustic model comes from the pocketsphinx download (en-us), and I also downloaded a full text version (https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English/), I want to get rid of most of the phones inside.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ok I have finished porting over the pocketsphinx to our arm based embedded platform.
My dictionary contains only one word (e.g. the hot word), and tried with few short wav files and get correct result. I then tried a long music file. I see it almost occupy a whole 1.5G arm core. I wonder if this is expected? Is there a way get better performance? since this is contant detection I can't afford to have this much of cpu usage.
You need to read our tutorial first to understand you need to modify the language model and keep the dictionary as is.
You also need to provide more details on what language model are you using exactly and also provide a pocketsphinx output to get help on decoding speed.
No I use kws option so no lm is needed, I only need to detect one keyword in my case.
I am thinking that I might need to 'adapt' the acoustic model, just for this specific keyword. This way the acoustic model will be smaller and that should improve the workload I believe (since you only need to detect only very few phones).
From this link looks like we are trying to do the same thing: https://sourceforge.net/p/cmusphinx/discussion/speech-recognition/thread/1cb5594e/
However in the post you only mentioned that "Yes, you can remove senones and triphones which you will never see with a custom tool" I would like to know what is the custom tool? The acoustic model comes from the pocketsphinx download (en-us), and I also downloaded a full text version (https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English/), I want to get rid of most of the phones inside.
You still need to provide the logcat output.
The tool you will write yourself.