I'm a big fan of voice assistants like Siri, Google Now and Cortana but was always a bit disapointed that you could not teach them new things. Out of curiosity and because there was no program like this for the desktop pc I started this project a while ago I call I.L.A. - intelligent learning assistant.
Since then it has grown a lot and I would like to share it with you:
It uses the recent version of sphinx-4 and the google speech API for voice recognition. The sphinx-4 part uses a grammar file that gets automatically extended while you teach ILA new stuff, but can also be configured to use a language model instead. I'm using the generic en-us model I found in the downloads section and the new 0.7b dictionary (with some extensions), but I'd be happy if you try out other models and report your experiences! The program fully supports german language too I just had some poor experiences with german acoustic models ... maybe you can find one that works well?
Last but not least I'd like to thank all the people involved in programming Sphinx, this is a great open source tool and I hope to see many more and exciting things in the future.
Hope you enjoy ILA :-)
- Florian
Last edit: Florian 2014-12-03
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Acutally I tried this voxforge model before and it didn't work so well. Thinking I maybe had some bad config I tried again and with a little tweaking of the dictionary I got it working well now! :-D
It's included in the new version 2.7 check out the webpage. I also fixed a bug that led to a broken dictionary folder for other acoustic models (including german) so if you tried your own model and got an error please try again now :-)
Another addition is a more flexible custom command using 'open parameters'. You can define a customsearch-link with '****' at the end and a parameter as '***' in the teach-GUI now. A little example:
Key Sentence: search my page for
Command: (open a link in the default browser)
Parameter 1: http://my.homepage.com/search/**** (yes, 4 stars)
Parameter 2: ***
Parameter 3: searching my page
full command contructed by GUI:
search my page for;customs;customsearch;http://my.page.com/search/****;***;searching my page
usage (works best without grammar restriction or self-defined jsgf grammar):
"search my page for ... videos of cats/pictures of dogs/meaning of life/...
Hope you like it!
Florian
Last edit: Florian 2014-12-06
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
ILA has been updated! mainly its about performance enhancements on weaker systems, localization and tweaking. Here are the patch-notes:
global hotkey support: let ILA run in the background and call it by holding a mousebutton for 1s (or whatever hotkey you want), especially nice when operating with wireless mouse or keyboard ;-)
performance enhancements: ILA is automatically in power saver mode while minimized, in addition there is now a second sphinx-4 mode (settings: sphinx (live), sphinx (rec)) that works by first recording a wav and then transcribing it. That gives the GUI a bit faster response time and seems to work more reliable on older systems.
new "installation"-script that automatically creates a desktop shortcut of ILA and assigns the Java VM more memory cause I've seen a lot of systems that where behaving weird because of memory issues. ILA with sphinx needs around 400MB I feel
convenience upgrades: less restarts necessary! No need for a restart anymore after teaching new commands (instant grammar update) and possibility of changing language on-the-fly
localization updates: teach-keywords in english and german and rearranged order, linkList updates for germany and USA, improved map search
better support for new languages and switching made easier with new "set language to ..." command (in case you want to make a bunch of custom commands in spanisch, french, turkish, dutch ... google has them all and sphinx just needs a good acoustic model ^^)
many improvements to the command-interpreter (especially for timers), new words in the dictionaries, more reliable ip-location service, improved test-command
bugfixes, tweaks in the interface
enjoy! :-)
Florian
Last edit: Florian 2014-12-17
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
A lot of new features have been added with the main focus on extending possibilities for user-defined commands giving ILA the ability to ask for parameters with self-defined questions and on-the-fly grammar switching (if you are using sphinx4). Here are the patch notes:
on-the-fly grammar change for sphinx-4 with the ability to load your own grammar-file (tutorial on the homepage)
RSS Feed reading (e.g.: ILA will read the first 3 headlines and you can ask her to open the corresponding link), be sure to check-out the new 'load personal news feed'-command :-)
test-Command Button in teach-interface, you can test a custom command now before saving it ^^
you don't know what to ask ILA? Then have a look at the suggestions popping up in the input text field :-)
INSTALL-scripts for Windows, Linux and MAC (the MAC one could use some love ^^)
a lot of tweaking and bug-fixing
Have fun and enjoy! :-)
Florian
Last edit: Florian 2015-01-07
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm happy to announce the release of ILA Beta v3.2! :-)
The main focus lies on improving grammar-free speech recognition accuracy with Sphinx-4 bringing ILA one step closer to becoming independant of Googles's Speech API. Here are the recent patch notes:
updated to the most recent version of Sphinx-4 with support for the new PTM acoustic models
included the new CMU Sphinx en-us acoustic model (non-PTM) with greatly improved accuracy
added MLLR unsupervised speaker adaptation (type 'ussa' in ILA's input field) and auto-loading of MLLR_matrix files when added to the acoustic model folder
added a GUI to help you train your acoustic models (type 'amt' in ILA's input field) (tutorial soon)
the pre-rec recognizer (settings->ILA speech engine->Sphinx-4 offline (rec)) works much better now with in the grammar-free mode (settings->use grammar->red(off)).
Note: the LiveSpeechRecognizer (Sphinx-4 offline (live)) only works reliable with grammar turned on, I'm trying to find the problem!
to be able to use grammar-free mode I've added two simple language models for 'en' and 'de'
new setting that allows grammar + non-grammar mixing (Settings->use grammar on ILA question) that means ILA will switch back to grammar mode when you specifically told her to in e.g. an 'open parameter' command
'open parameter' commands have been improved to filter user-specified words to prevent things like "play some musik of of Jimi Hendrix" (tutorial available soon)
added a button for the audio samplerate to the settings (if you want to use 8kHz acoustic models) and fixed a bug that actually prevented switching to anything else than 16kHz
when adding new commands to the grammar and ILA's memory (teachit_xy.txt) ILA checks now if this command already exists and replaces the old one instead of adding a 'dead' command to the end of the files
added a bunch of Icons to the windows shown in the taskbar (windows) and an updated manifest file (for windows start screen)
many more or less visible tweaks to the UI and as usual bugfixing (yes it's still a beta ;-) ) e.g. fixing problems with unsupported translucency and buttons in the Mac version
Last edit: Florian 2015-01-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've pasted a note about ILA on our website, it looks pretty slick. I wish you keep the peace of updates and integrate things like keyphrase and PTM model for better responsiveness.
Let us know if we can help.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
thanks a lot for this nice note on the main page! :-D
I'm definately motivated to push Sphinx (and my abilities ^^) to the limit, so expect to see further updates soon :-) The next thing on the list is integrating MaryTTS to be completely independant of Google (almost done).
About this Keyphrase recognition. Right now I'm using the LiveSpeechRecognizer with small Grammar to look out for "Hey ILA" (settings->Hey ILA (button)). It works nicely except I need to tune the OutOfGrammar probability a bit more but it's quiet resource intensive. Is there actually a simpler way? It seems there is a KeyWordSpotting Class I didn't know of ^^?
The PTM model itself works fine (although accuracy is much lower for me) just the MLLR adaption fails because of the way the matrix gets loaded (decoder.adaptation.Transform). I'll have a closer look into this again.
One more thing maybe. In the beginning what gave me a major headache was to interrupt the LiveSpeechRecognizer and restarting it right after that. I'm using several workarounds like crashing the recognizer and stream an abort command but especially when the recognizer had not jet worked on a single utterance (because everything before was blocked as noSpeech) I always got an allocation error that prevented me from restarting the Recognizer. I had to introduce a "wait for the first utterance" method and reduced the speechThreshold temporarily which makes the abortion process really laggy sometimes. Usually it's not a big problem because that happens rarely but it would be really nice to have a simple and fast "stop Recognizer" commad :-) or did I miss something? ^^
cu,
Florian
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
ILA has just become even more customizable, more reliable, faster and smarter!
Here is a list of what's new:
ILA has been updated to support the open source Text-to-Speech System MaryTTS this means basically 2 things: ILA is completely free from any cloud service now (if you want) aaaand you can add new voices yeah! :-)
Beta v3.3 introduces a new dynamic language model, something I really enjoy! It means even the grammar-free mode can learn now everything you teach ILA and the program is getting better in recognizing these commands every time. One major step away from grammar restrictions to natural language recognition.
in addition all the links/program names inside the 'Apps' folder are automatically part of the dynamic model (if the dictionary knows the words - you can check that in the START_debug mode)
thanks to tweaks done by the sphinx-4 team and some pre-loading of stuff ILA is more responsive now and works a bit faster
ILA will warn you now when you want to teach her a new command that includes unknown words and asks you to add these words to the dictionary yourself
there is a welcome screen now that'll give you some basic info and tell you how to adapt ILA better to your hardware
speaker adaptation can greatly improve speech recognition accuracy. Unfortunately there was a bug in ILA where the previously adapted model was not loaded correctly and in the end there might have been no improvement at all after restart. Speaker adaptation has been fixed and greatly improved so you will get more feedback about the result. In addition adaptation works now for the en-us acoustic PTM model as well
many people asked me for the source code of ILA. As this is still a beta version and many things are changing quickly I don't think it helps the project right now to release it completely buuut there is an 'Addons' folder now with smaller parts of ILA available as code. The focus right now lies on "expansion" and "localization" I'll write more about that soon for now please check the files "Addons/ILA_answers.jar","Addons/ILA_addons.jar" and "Addons/ILA_welcome.jar". You can simply extract these 3 files with a ZIP program to get the source code (besides the Java Class) :-)
all text coding has been changed to UTF-8 standard. I hope that fixes all the problems with special characters and opens up new possibilities for other languages
as usually there is also a lot of tweaking, dictionary updates, command updates and bug fixes
I hope you like the new version as much as I do :-D and I'd be happy to hear about your experiences with ILA! For more news and tutorials please visit the ILA homepage.
cu,
Florian
Last edit: Florian 2015-02-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I did some more changes to the DynamicLanguageModel and changed some things in the Transform Class to finally be able to adapt the PTM models. I'll upload the code here soon so you can have a look. Everything works fine right now, especially with the "wordPruningLookaheadSearchManager" I have the feeling everything is much faster :-D
There is one problem though, I tried the new 8kHz PTM model and I can't get it working at all. I'll try to do it again and post the error message.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
ILA has been updated to version Beta 3.5 :-) Some new features include:
integration of pocketsphinx command line tool (its not particularly fast because you have to run through all the initialization process on every call but man it uses so few memory :-) ). It supports grammar switching, can use the same dynamic language model as sphinx-4 and there is a config file you can use to add parameters. The accuracy is unfortunately rather low compared to sphinx-4 (settings problem?) and I can't run the keyphrase spotter as a Java process in a reliable way :-(
bug fixing for Linux sound problems and other OS related problems
finally fixed saving and loading of the speaker adaptation (for PTM models aswell)
and much more :-) read all the changes since 3.3 here
cu,
Florian
Last edit: Florian 2015-03-07
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
A new update for ILA is available :-) Beta v3.6 comes with full support for Pocketsphinx and many content updates like improved reminders. Here you can read all the details:
ILA has been updated with a reworked add-on system, better process handling, better responsiveness for sphinx-4, an updated Google API and a lot more 'freedom' :-) Check out the full list of changes:
Its time again for an update :-) ILA has version number 3.8 now and comes with a carefully updated look ^^, context support for commands (making it possible to use the same key sentence for different actions) and improved add-ons. See the whole story here:
Hello everybody,
I'm a big fan of voice assistants like Siri, Google Now and Cortana but was always a bit disapointed that you could not teach them new things. Out of curiosity and because there was no program like this for the desktop pc I started this project a while ago I call I.L.A. - intelligent learning assistant.
Since then it has grown a lot and I would like to share it with you:
https://sites.google.com/site/ilavoiceassistant/
It uses the recent version of sphinx-4 and the google speech API for voice recognition. The sphinx-4 part uses a grammar file that gets automatically extended while you teach ILA new stuff, but can also be configured to use a language model instead. I'm using the generic en-us model I found in the downloads section and the new 0.7b dictionary (with some extensions), but I'd be happy if you try out other models and report your experiences! The program fully supports german language too I just had some poor experiences with german acoustic models ... maybe you can find one that works well?
Last but not least I'd like to thank all the people involved in programming Sphinx, this is a great open source tool and I hope to see many more and exciting things in the future.
Hope you enjoy ILA :-)
- Florian
Last edit: Florian 2014-12-03
Dear Florian, thanks, this looks like an amazing project!
I'll post something about it on frontpage probably in a coming days. Please keep ups updated about project progress.
Latest German models are available here:
http://goofy.zamia.org/voxforge/de/readme.txt
Did you try them?
Hi Nickolay, sounds great!
Acutally I tried this voxforge model before and it didn't work so well. Thinking I maybe had some bad config I tried again and with a little tweaking of the dictionary I got it working well now! :-D
It's included in the new version 2.7 check out the webpage. I also fixed a bug that led to a broken dictionary folder for other acoustic models (including german) so if you tried your own model and got an error please try again now :-)
Another addition is a more flexible custom command using 'open parameters'. You can define a customsearch-link with '****' at the end and a parameter as '***' in the teach-GUI now. A little example:
Key Sentence: search my page for
Command: (open a link in the default browser)
Parameter 1: http://my.homepage.com/search/**** (yes, 4 stars)
Parameter 2: ***
Parameter 3: searching my page
full command contructed by GUI:
search my page for;customs;customsearch;http://my.page.com/search/****;***;searching my page
usage (works best without grammar restriction or self-defined jsgf grammar):
"search my page for ... videos of cats/pictures of dogs/meaning of life/...
Hope you like it!
Florian
Last edit: Florian 2014-12-06
ILA has been updated! mainly its about performance enhancements on weaker systems, localization and tweaking. Here are the patch-notes:
enjoy! :-)
Florian
Last edit: Florian 2014-12-17
Welcome to 2015 and welcome to ILA Beta 3.0 :-)
A lot of new features have been added with the main focus on extending possibilities for user-defined commands giving ILA the ability to ask for parameters with self-defined questions and on-the-fly grammar switching (if you are using sphinx4). Here are the patch notes:
Have fun and enjoy! :-)
Florian
Last edit: Florian 2015-01-07
I'm happy to announce the release of ILA Beta v3.2! :-)
The main focus lies on improving grammar-free speech recognition accuracy with Sphinx-4 bringing ILA one step closer to becoming independant of Googles's Speech API. Here are the recent patch notes:
Note: the LiveSpeechRecognizer (Sphinx-4 offline (live)) only works reliable with grammar turned on, I'm trying to find the problem!
Last edit: Florian 2015-01-29
Thank you for update Florian
I've pasted a note about ILA on our website, it looks pretty slick. I wish you keep the peace of updates and integrate things like keyphrase and PTM model for better responsiveness.
Let us know if we can help.
Hi Nickolay,
thanks a lot for this nice note on the main page! :-D
I'm definately motivated to push Sphinx (and my abilities ^^) to the limit, so expect to see further updates soon :-) The next thing on the list is integrating MaryTTS to be completely independant of Google (almost done).
About this Keyphrase recognition. Right now I'm using the LiveSpeechRecognizer with small Grammar to look out for "Hey ILA" (settings->Hey ILA (button)). It works nicely except I need to tune the OutOfGrammar probability a bit more but it's quiet resource intensive. Is there actually a simpler way? It seems there is a KeyWordSpotting Class I didn't know of ^^?
The PTM model itself works fine (although accuracy is much lower for me) just the MLLR adaption fails because of the way the matrix gets loaded (decoder.adaptation.Transform). I'll have a closer look into this again.
One more thing maybe. In the beginning what gave me a major headache was to interrupt the LiveSpeechRecognizer and restarting it right after that. I'm using several workarounds like crashing the recognizer and stream an abort command but especially when the recognizer had not jet worked on a single utterance (because everything before was blocked as noSpeech) I always got an allocation error that prevented me from restarting the Recognizer. I had to introduce a "wait for the first utterance" method and reduced the speechThreshold temporarily which makes the abortion process really laggy sometimes. Usually it's not a big problem because that happens rarely but it would be really nice to have a simple and fast "stop Recognizer" commad :-) or did I miss something? ^^
cu,
Florian
ILA Beta v3.3 is here :-)
ILA has just become even more customizable, more reliable, faster and smarter!
Here is a list of what's new:
ILA has been updated to support the open source Text-to-Speech System MaryTTS this means basically 2 things: ILA is completely free from any cloud service now (if you want) aaaand you can add new voices yeah! :-)
Beta v3.3 introduces a new dynamic language model, something I really enjoy! It means even the grammar-free mode can learn now everything you teach ILA and the program is getting better in recognizing these commands every time. One major step away from grammar restrictions to natural language recognition.
in addition all the links/program names inside the 'Apps' folder are automatically part of the dynamic model (if the dictionary knows the words - you can check that in the START_debug mode)
thanks to tweaks done by the sphinx-4 team and some pre-loading of stuff ILA is more responsive now and works a bit faster
ILA will warn you now when you want to teach her a new command that includes unknown words and asks you to add these words to the dictionary yourself
there is a welcome screen now that'll give you some basic info and tell you how to adapt ILA better to your hardware
speaker adaptation can greatly improve speech recognition accuracy. Unfortunately there was a bug in ILA where the previously adapted model was not loaded correctly and in the end there might have been no improvement at all after restart. Speaker adaptation has been fixed and greatly improved so you will get more feedback about the result. In addition adaptation works now for the en-us acoustic PTM model as well
many people asked me for the source code of ILA. As this is still a beta version and many things are changing quickly I don't think it helps the project right now to release it completely buuut there is an 'Addons' folder now with smaller parts of ILA available as code. The focus right now lies on "expansion" and "localization" I'll write more about that soon for now please check the files "Addons/ILA_answers.jar","Addons/ILA_addons.jar" and "Addons/ILA_welcome.jar". You can simply extract these 3 files with a ZIP program to get the source code (besides the Java Class) :-)
all text coding has been changed to UTF-8 standard. I hope that fixes all the problems with special characters and opens up new possibilities for other languages
as usually there is also a lot of tweaking, dictionary updates, command updates and bug fixes
I hope you like the new version as much as I do :-D and I'd be happy to hear about your experiences with ILA! For more news and tutorials please visit the ILA homepage.
cu,
Florian
Last edit: Florian 2015-02-20
I did some more changes to the DynamicLanguageModel and changed some things in the Transform Class to finally be able to adapt the PTM models. I'll upload the code here soon so you can have a look. Everything works fine right now, especially with the "wordPruningLookaheadSearchManager" I have the feeling everything is much faster :-D
There is one problem though, I tried the new 8kHz PTM model and I can't get it working at all. I'll try to do it again and post the error message.
ILA has been updated to version Beta 3.5 :-) Some new features include:
and much more :-) read all the changes since 3.3 here
cu,
Florian
Last edit: Florian 2015-03-07
A new update for ILA is available :-) Beta v3.6 comes with full support for Pocketsphinx and many content updates like improved reminders. Here you can read all the details:
https://sites.google.com/site/ilavoiceassistant/news-and-comments
ILA has been updated with a reworked add-on system, better process handling, better responsiveness for sphinx-4, an updated Google API and a lot more 'freedom' :-) Check out the full list of changes:
ILA news update
Last edit: Florian 2015-04-25
Its time again for an update :-) ILA has version number 3.8 now and comes with a carefully updated look ^^, context support for commands (making it possible to use the same key sentence for different actions) and improved add-ons. See the whole story here:
https://sourceforge.net/projects/ila-voice-assistant/