From: johnyjj2 <joh...@gm...> - 2009-12-21 20:28:04
|
Hello! =================== GENERAL INFORMATION ABOUT WHAT I'D LIKE TO DO ============================= I'd like to create such an application that: 1. User calls special number from mobile phone. 2. It connects him/her to server. (The user chooses if he/she wants to use DTMF or ASR but now let's focus only on ASR). 3. He/she speaks twelve digits, then say "details", speak information about the details, says "next" and repeats the same, until he/she says "finish". >From time to time server may say something, not based on TTS, but based on prerecorded mp3/wav files. (The server has to check control sum of those twelve digits every time according to some kind of simple mathematical algorithm and inform the user if the number was correct or not)! 4. The server saves results of recognition in the database. =================== WHAT I ALREADY HAVE DONE ================================================== I created acoustic model for my language in SphinxTrain. My language is not supported in VoxForge (which has got about eight languages, including English, Spanish, Russian, French and so on). So I have to use my acoustic model for speech recognition. I also have got .java source code file with . =================== WHAT I'D LIKE YOU TO HELP ME WITH ========================================= I'd like you to tell me where and how I should specify my algorithm. The question "where" is about ninth step of "Steps which I'd like to follow, connected with Zanzibar and Cairo" section. The question "how" is also connected with formal grammars because I think it may be much easier to do everything based on very simple grammar, where there is only one group of words, which simply contains all the words and have the main algorithm in source code rather than in formal grammar. I also would be greatful for help with integrating Sphinx with Asterisk. My doubts about integrating are shown in "Steps which I'd like to follow, connected with Zanzibar and Cairo" section. What I'd like to is not the answer to those minor questions but help with following the fifth step. I indicated important issues with bold font. I gave those minor question to show you what is or is not clear for me after reading on-line documentation of Cairo and Zanzibar. =================== GENERALS STEPS WHICH I'D LIKE TO FOLLOW =================================== (It is just general idea, I'd like you to help me with fifth point). I thought about doing the following: 1. Install Twinkle in order to emulate calling from mobile phone to server. 2. On the same computer I install Asterisk. 3. I configure SIP trunk for Asterisk so that it would be able to receive calls from Twinkle. How should this configure file look like for Twinkle? 4. I install Sphinx4 (I already have got Sphinx4 on my computer). I finish creating application for Sphinx4, which uses my acoustic model, my grammar and my algorithm. 5. I use Zanzibar and Cairo to integrate Sphinx4 with Asterisk. (Look at the next section). This step looks like the most difficult for me. Especially because I'm not quite sure where and how to specify my algorithm. 6. I create additional dialplan so that: a) the system would ask the user with the use of DTMF if he/she wants to use ASR or DTMF in main session. If he/she chooses ASR, the further communication would be with Sphinx4. If he/she chooses DTMF, everything would be based on DTMF. How can I do such a thing? Previously I thought that those would be two different, independent systems (midlet on mobile phone, using httpconnection, post method and Tomcat on server). Now I guess perhaps using DTMF in Asterisk may be better solution. 7. I test my application to ensure myself that the whole system works fine. 8. I buy account from SIP provider with PSTN number. I install Asterisk and Sphinx4 on server (previously it was tested on my computer). Possible problems: too slow internet connection. Solution: I will have to use Digium card or VoIP instead of SIP, or rent server in Data Centre. Other possible problem: Windows instead of Linux. I guess using Linux for Asterisk is better idea, however I hope there shouldn't be big difference between followin all of mentioned steps in Linux and doing the same in Windows. 9. I record samples of destination users' voices. There is no good (several or hundreads hours of recordings) acoustic model for free, available for Polish language, the only what is left is speaker-dependent system. I create new acoustic model, taking into account new speech samples. 10. System starts working. 11. I create some kind of feedback to obtain the knowledge how to improve the system. 12. I create web application to enable access to obtained data, through system of information available everywhere in the internet. =================== STEPS WHICH I'D LIKE TO FOLLOW, CONNECTED WITH ZANZIBAR AND CAIRO ======================== 1. Install Cairo: http://www.speechforge.org/projects/cairo/install.html 2. Install Zanzibar: http://www.spokentech.org/openivr/install.html 3. Start the Cairo server: http://www.speechforge.org/projects/cairo/intro.html . Should I use bin/lanuch.sh (if there is sh, not only bat)? Or rather rserver.bat/sh, transmitter1.bat/sh, receiver1.bat/sh? I guess I need bin/launch.sh. 4. Start the openIVR Server (Zanzibar server): http://www.spokentech.org/openivr/intro.html . Should I use allinone.bat (or rather sh for Linux, if there is any sh)? Or rather rserver.bat, transmitter1.bat, receiver1.bat? 5. Configure Zanzibar server: http://www.spokentech.org/openivr/intro.html . You say "no changes should be required if you are running zanzibar and cairo on the same machine" so I guess no changes are needed. However shouldn't I change mySipAddress or port? Don't I need configure the Dialog Service by using ApplicationBySipHeaderService? 7. Softphone: http://www.speechforge.org/projects/cairo/intro.html ("Running the examples"). Temporarily configure Zanzibar for Twinkle (instead of Xlite which I don't know). Later I guess I will need to erase changes in configuration, after having access to normal SIP provider account, rather than Thinkle. 8. Follow http://www.spokentech.org/openivr/intro.html "Running in Asterisk mode - Dialplan integration" section. 9. Write my own speech application. The main question is here. Should I use vxml or Cairo client API for writing my application? Or sepcify it in dialplan of Asterisk? Or maybe in runApplication method? Or in java source code of Sphinx4 applicaton? Or should I write logic of my app in code based on this Parrot Speech Application code from http://www.spokentech.org/openivr/writing-speechlets.html ? In http://www.spokentech.org/openivr/aik.html I found that I may write logic of my app either in voicexml or in java apps (mrcp4j api or ciairo-client api). <u>In other words - where and how to specify my algorithm?</u> 10. Follow http://www.spokentech.org/openivr/aik.html . =================== MINOR QUESTIONS CONNECTED WITH ZANZIBAR AND CAIRO ============================================ 1. http://www.speechforge.org/projects/cairo/intro.html -> Running the Demo MRCPv2 Clients -> Available Clients -> "Each client can be started by running the appropriate batch script located in the demo/bin directory of your Cairo installation" -> after all: I need to run batch to start demo. Should I run batch directly? 2. http://www.speechforge.org/projects/cairo/intro.html -> Running the Cairo-client Demo -> Can I use only bargein and parrot to write my code? Why only demo-bargein and demo-parrot are available in the cairo-client? At first I though demo-standalone would be more proper starting point for my application but it looks like it is not available in the cairo-client. 3. http://www.speechforge.org/projects/cairo/dependencies.html -> Project Dependencies -> runtime -> WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz -> Can't I use my own acoustic model?!?! 4. http://www.speechforge.org/projects/cairo/dependencies.html -> Project Transitive Dependencies -> Do I need to install all of these dependencies by "sudo get-app install" or are they included in Cairo? 5. http://www.speechforge.org/projects/cairo/dependencies.html -> Dependency Listings -> Commons Configuration -> "Tools to assist in the reading of configuration/preferences files in various formats" -> Would I benefit from using it? 6. http://www.speechforge.org/projects/cairo/dependencies.html -> Dependency Listings -> Codec -> "phonetic encoding utilities" -> I already created my list of phonemes for SphinxTrain. Is it somehow connected with it? 7. http://www.speechforge.org/projects/cairo/dependencies.html -> Project Dependency Graph -> Dependency Listings -> Unnamed - sphinx:WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz:jar:1.0beta -> Why both TIdigits and WSJ are used? 8. http://www.spokentech.org/openivr/intro.html -> Running the Examples (in Demo Mode) -> Prerequisites -> "You will need a sip softphone like Xlite to access the demos" -> How to use Twinkle instead? Is the configuration the same? 9. http://www.spokentech.org/openivr/intro.html -> Running the Examples (in Demo Mode) -> Prerequisites -> "you will require (preferably high quality) microphone" -> Why does it require good quality microphone? I thought in reality there wouldn't be good quality, but rather poor 8kHz telephone speech, not 16kHz which is possible for microphones? 10. http://www.spokentech.org/openivr/intro.html -> Running the Examples (in Demo Mode) -> Prerequisites -> What's the difference between parrot and vxml-parrot? By the way, I'm more or less familiar with JSGF grammars. But wouldn't it be better for my application not to use grammars extensively, but rather to create algorithm in source code, and use very simple grammar which only contains list of words, like e.g. <utterance>=<list_of_words>? 11. http://www.spokentech.org/openivr/intro.html -> Running in Asterisk Mode -> Dialplan Integration -> three lines of code for x-channel and x-application -> It is to integrate Zanzibar with Asterisk dialplan. Do I need to add it after all? 12. http://www.spokentech.org/openivr/intro.html -> Running in Asterisk Mode -> Dialplan Integration -> a) type=beanId -> I don't get it, b) type=className -> Does it mean I can use MyApplication.java source code of Sphinx4 here?, c) type=vxml -> Can I accomplish what I need to do with vxml? 13. http://www.spokentech.org/openivr/intro.html -> Running in Asterisk Mode -> Asterisk Manager Interface (AMI) Configuration -> I think I don't need it. Am I right? 14. http://www.spokentech.org/openivr/architecture.html -> I will have account from SIP provider with PSTN numer. I see SIP and RTP here. What do I need to do with this RTP? 15. http://www.spokentech.org/openivr/writing-speechlets.html -> Parrot Speech Application -> Why are those three last functions empty? 16. http://www.spokentech.org/openivr/aik.html -> Asterisk Integration Kit -> Dialplan Integration -> What do I need to specify in my dialplan? I guess only speech redirection from Asterisk to Zanzibar. I also guess main part of my logic would be somewhere else then in dialplan, but I don't know exactly where. Thanks very much for your help in advance! Regards! -- View this message in context: http://old.nabble.com/using-other-acoustic-models-in-Cairo-Zanzibar-tp26879547p26879547.html Sent from the cairo-user mailing list archive at Nabble.com. |