From: johny j. <joh...@gm...> - 2010-01-10 23:50:05
|
Thanks for your answer! Summing up most crucial things from what I wrote below, is there any way to move the whole logic from vxml and gram files to .java file? If it is, may you give some examplary code showing this approach, please? May you also write something more about "analyze the raw results" and "select the name of a wav file" parts? ------------------------ You say you are not vxml expert. Unfortunately I'm neither. I tried to find some kind of internet forum or mailing list for vxml users but I couldn't. One of those "forums" didn't contain internet forum, the other was forum for 'Java VoiceXML Interpreter' application. Do you know any vxml forum? I found this great example about vxml http://www.w3.org/TR/voicexml20/#dml2.1.4 (second code in the section - credit card information). It uses digits.grxml (http://mail-archives.apache.org/mod_mbox/jakarta-taglibs-dev/200506.mbox/%3C14269939D726BB43BAD3D11BBE5B8B0C6FA361@ukflumail01.FLUENCYVOICE.LOCAL%3E). >From this example it looks like I can ask the user to speak twelve digits. But the above doesn't help with adding control sum calculation to the application. So I thought that it may be good idea to use some kind of inline script in vxml. I found this http://www.w3.org/TR/voicexml20/#dml1.5.3 which uses value returned by acct_info.vxml#basic which I found here http://msdn.microsoft.com/en-us/library/bb857574.aspx . I also found this http://www.w3.org/TR/voicexml20/#dml5.3.12 about using inline scripts. (Calculating control sum is not the only one thing, which is not so easy from vxml side - I also need to save recognized digits to database or text file). By the way, can I use ECMAscripts in vxml file for Cairo/Zanzibar? I also found http://www.w3.org/TR/voicexml20/#dml2.3.4 (part which says "Sorry, your credit card") which checks information about credit card. It is somehow similar to my application which also needs to check something (control sum) based on sequence of digits. I just wonder how to use it in my case. The above still doesn't help too much. The life would be much easier for me if I can simply specify grammar which has <list_of_words> = first_word | ... | last_word, then to have in source code string variable which contains actually recognized word and do everything in java source code, without any vxml or grammar files. It wouldn't be elegant solution and the only difficulty would be again with playing wav files as answers. Of course I also would need to have the loop which checks if 'string r != ""' and allow me to read the string and proceed with it if it is not null. That would involve using some variables to store actual position. Is there any way to move the whole logic to .java file? Doing it - together with allowing to play wav files from inside it - would save much time and effort for trying to do the same things with vxml. It looks like the easiest way, if it is possible. If it is, may you give some examplary code showing this approach, please? Anyway, let me show what I've got now: <?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/vxml http://www.w3.org/TR/voicexml20/vxml.xsd"> <form id="get_sequences_of_digits"> <block>Welcome to the application. You can speak twelve digits, then say next. If you mistake, you can simply say the word mistake /that's other difficulty to add this feature/. After saying twelve digits, the control sum will be checked. If it is all right, you will be informed about it and allowed to say next twelve digits. If it won't be correct, you will be asked if you'd like to repeat digits again or accept digits with improper code. If you will say less or more than twelve digits and then say next, you also will be asked to accept or reject number. You can exit the application by saying exit.</block> <field name="twelve_digits"> <grammar type="application/srgs+xml" src="/grammars/digits.grxml"/> <prompt bargein="false"><audio src="/audio/speak_twelve_digits.wav"/></prompt> <!-- from examplary talk it looks like 'if' which I erased from here wasn't used --> <filled> <!-- the code below is useless, i can't specify it properly --> <if cond="(control_sum_is_ok != true")> Control sum improper. Would you like to accept it anyway? <clear namelist="card_num"/> <throw event="nomatch"/> <else/> Control sum is OK. You can proceed with new set of digits. <clear namelist="card_num"/> <throw event="nomatch"/> </if> </filled> </field> </form> </vxml> And the code of my version of digits.grxml: <?xml version="1.0"?> <grammar xmlns="http://www.w3.org/2001/06/grammar" xmlns:nuance="http://voicexml.nuance.com/grammar" version="1.0" mode="voice" xml:lang="en-US" tag-format="Nuance" root="numbers"> <!-- http://www.apache.org/licenses/LICENSE-2.0 --> <rule id="digit"> <one-of> <item> zero <tag> return("0") </tag></item> <item> jeden <tag> return("1") </tag></item> <item> dwa <tag> return("2") </tag></item> <item> trzy <tag> return("3") </tag></item> <item> cztery <tag> return("4") </tag></item> <item> piec <tag> return("5") </tag></item> <item> szesc <tag> return("6") </tag></item> <item> siedem <tag> return("7") </tag></item> <item> osiem <tag> return("8") </tag></item> <item> dziewiec <tag> return("9") </tag></item> </one-of> </rule> </grammar> -------------------------- I still wonder why you suggested me to use rserver, transmitter1, receiver1 and asteriskConnector rather than launch.sh in your first mail, if this launch should run all four things. > You can always try to parse and analyize the raw results > In any case, the code that analizes the results (either the raw results or the tags) can select the name of a wav file that will be passed into the next call to playAndRecognize(..) May I ask for some further tips how to implement it, please? > Given a number like 1234, will your grammar allow for "one two three four" or "one thousand two hundred and thirty four" or "twelve thirtyfour" I would make it possible only to use ten digits. You are fortunate because there is already existing English acoustic models of good quality on VoxForge. I'm not this fortunate (I don't use English) and the only what I can do is to create speaker-dependent model. I don't want to force users to follow too much training which would be required for more sophisticated acoustic models. > Sorry about the bug, can you let me know what you fixed, so I can check it in of others. Maybe send the fixed file? I changed only one character in one, short file (/zanzibar-0.1/bin/cairo/rserver.sh) from: sh ..\launch.sh $CLASS -sipPort 5050 -sipTransport udp to this: sh ../launch.sh $CLASS -sipPort 5050 -sipTransport udp In the first case it told me that it cannot find ..launch.sh. > Also do you remember which script had trouble? There should be no need to copy things around like that. As far as I remember it was /zanzibar-0.1/bin/cairo/transmitter1.sh. > if you can send the dialplan, I can take a look too? It is just original dialplan of newly-installed Asterisk. After all the only changes which I have done are as follows: sip.conf (http://www.spokentech.org/openivr/aik.html) [Zanzibar] type=peer host=localhost //I wasn't sure what to write here port=5090 dtmfmode=info canreinvite=no extensions.conf (Is [demo] the proper place to put it? Would I need to change it somehow in the future, when I will be using SIP provider account with PSTN numer instead of Twinkle softphone?): exten => 1001,1,SIPAddHeader(x-channel:${CHANNEL}) exten => 1001,n,SIPAddHeader(x-application:basic|org.speechforge.apps.demos.Parrot) exten => 1001,n,Dial(SIP/Zanzibar) And now I connect from Twinkle to 1001@127.0.0.1. This time I hear "connecting" but it never reaches "established" state. In the future I will use number from the SIP provider which gave this examplary configuration: http://forum.ipfon.pl/index.php?topic=64 > I am interested in building models too. Have you had much success yet? I created all the files required to build acoustic model. Those are list of phonemes for my language, list of words with their transcription, language model, wav files with their transcription for both training and testing and so on. I also configured SphinxTrain, run all of those perl scripts (there were some difficulties with doing that), created model from those files, tested it with decode.pl script and test package (my wav files with their transcriptions), obtained good results for speaker-dependent model, packed to jar file. Now I'd like to test those in Cairo/Zanzibar instead of using WSJ model. If you've got any questions about building models, just ask. If I will know, I'll answer for sure. Let me also say about process of compiling. Would it be just like: 1. I extract cairo-rtp-0.2-src.tar.bz2, 2. I create my application in src (java/.../demos -> myapp.java, resources -> grammar and wav files, voicexml -> vxml file), 3. edit configuration in src/resources/config/sphinx-config.xml, 4. type "ant" command in zanzibar-0.1 directory from terminal. I hope these changes in xml would be enough: //linguist part <property name="acousticModel" value="pl1"/> or maybe I should leave default value <property name="acousticModel" value="acousticModel"/> //dictionary part value="/home/mainaccount/acoustic/pl1/etc/pl1.dic"/> <property name="fillerPath" value="/home/mainaccount/acoustic/pl1/etc/pl1.filler"/> //acoustic model part - it is because of those model loaders I cannot use model loaders from WSJ. They suggested me to use those from sphinx3 as shown in 'transcriber' demo application. I hope it will work like this. <!-- I took the whole section from transcribe config.xml and chanded tidigits to pl1 --> <component name="tidigits" type="edu.cmu.sphinx.model.acoustic.pl1.Model"> <property name="loader" value="sphinx3Loader"/> <property name="unitManager" value="unitManager"/> </component> <component name="sphinx3Loader" type="edu.cmu.sphinx.model.acoustic.pl1.ModelLoader"> <property name="logMath" value="logMath"/> <property name="unitManager" value="unitManager"/> </component> Thanks for sharing your precious time :-) Regards! |