Re: [Cairo-user] using other acoustic models in Cairo/Zanzibar

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Mon, Dec 21, 2009 at 12:27 PM, johnyjj2 <joh...@gm...> wrote:

>
> Hello!
>
> =================== GENERAL INFORMATION ABOUT WHAT I'D LIKE TO DO
> =============================
>
> I'd like to create such an application that:
> 1. User calls special number from mobile phone.
> 2. It connects him/her to server. (The user chooses if he/she wants to use
> DTMF or ASR but now let's focus only on ASR).
> 3. He/she speaks twelve digits, then say "details", speak information about
> the details, says "next" and repeats the same, until he/she says "finish".
> >From time to time server may say something, not based on TTS, but based on
> prerecorded mp3/wav files. (The server has to check control sum of those
> twelve digits every time according to some kind of simple mathematical
> algorithm and inform the user if the number was correct or not)!
> 4. The server saves results of recognition in the database.
>
> =================== WHAT I ALREADY HAVE DONE
> ==================================================
>
> I created acoustic model for my language in SphinxTrain. My language is not
> supported in VoxForge (which has got about eight languages, including
> English, Spanish, Russian, French and so on). So I have to use my acoustic
> model for speech recognition.
>
> I also have got .java source code file with .
>
> =================== WHAT I'D LIKE YOU TO HELP ME WITH
> =========================================
>
> I'd like you to tell me where and how I should specify my algorithm.
>
> The question "where" is about ninth step of "Steps which I'd like to
> follow,
> connected with Zanzibar and Cairo" section.
>
> The question "how" is also connected with formal grammars because I think
> it
> may be much easier to do everything based on very simple grammar, where
> there is only one group of words, which simply contains all the words and
> have the main algorithm in source code rather than in formal grammar.
>
> I also would be greatful for help with integrating Sphinx with Asterisk. My
> doubts about integrating are shown in "Steps which I'd like to follow,
> connected with Zanzibar and Cairo" section.
>
> What I'd like to is not the answer to those minor questions but help with
> following the fifth step. I indicated important issues with bold font. I
> gave those minor question to show you what is or is not clear for me after
> reading on-line documentation of Cairo and Zanzibar.
>
> =================== GENERALS STEPS WHICH I'D LIKE TO FOLLOW
> ===================================
>
> (It is just general idea, I'd like you to help me with fifth point). I
> thought about doing the following:
> 1. Install Twinkle in order to emulate calling from mobile phone to server.
> 2. On the same computer I install Asterisk.
> 3. I configure SIP trunk for Asterisk so that it would be able to receive
> calls from Twinkle. How should this configure file look like for Twinkle?
>
I have not used twinkle, but I have used xlite to do something simular.
This should be straightforward SIP configuration in Twinkle and Asterisk.

> 4. I install Sphinx4 (I already have got Sphinx4 on my computer). I finish
> creating application for Sphinx4, which uses my acoustic model, my grammar
> and my algorithm.
> 5. I use Zanzibar and Cairo to integrate Sphinx4 with Asterisk. (Look at
> the
> next section). This step looks like the most difficult for me. Especially
> because I'm not quite sure where and how to specify my algorithm.
>
Sphinx4 is bundled in Cairo.  So that part of the integration is done.  If
you have your own modified version of Sphinx4,  you will have to replace the
sphinx jars in the cairo lib directory.  Depending on what you customized in
sphinx4, you may have to change cairo and rebuild.    Is it just the
acoustic models?

Zanzibar handles the inegration with Asterisk as well as selecting the
application/algorithm to run once a call is connected.

Your algorithm can be specified in voicexml or in java inside zanzibar.

Or if you use just dtmf, you could probably put your algorithm in the
asterisk dialplan.

> 6. I create additional dialplan so that: a) the system would ask the user
> with the use of DTMF if he/she wants to use ASR or DTMF in main session. If
> he/she chooses ASR, the further communication would be with Sphinx4. If
> he/she chooses DTMF, everything would be based on DTMF. How can I do such a
> thing? Previously I thought that those would be two different, independent
> systems (midlet on mobile phone, using httpconnection, post method and
> Tomcat on server). Now I guess perhaps using DTMF in Asterisk may be better
> solution.
>

You have options.  You can use asterisk dialplan for dtmf.  Or you can the
java application in zanziabr to do dtmf too.  Voicexml should also handle
dtmf -- but is not fully tested in zanzibar.  Zanzibar only supports DTMF
over SIP info messages  -- good enough for most needs.

Midlet solution is probably possible too work too.

> 7. I test my application to ensure myself that the whole system works fine.
> 8. I buy account from SIP provider with PSTN number. I install Asterisk and
> Sphinx4 on server (previously it was tested on my computer). Possible
> problems: too slow internet connection. Solution: I will have to use Digium
> card or VoIP instead of SIP, or rent server in Data Centre. Other possible
> problem: Windows instead of Linux. I guess using Linux for Asterisk is
> better idea, however I hope there shouldn't be big difference between
> followin all of mentioned steps in Linux and doing the same in Windows.
>

I have only run asterisk on linux.  SIP/RTP over internet has worked pretty
well for me.  But I have not done high volume calls.  Slow upload speeds
with many internet plans could be a problem.  Running in server room shuld
solve that.

> 9. I record samples of destination users' voices. There is no good (several
> or hundreads hours of recordings) acoustic model for free, available for
> Polish language, the only what is left is speaker-dependent system. I
> create
> new acoustic model, taking into account new speech samples.
>

Cairo does not have a recorder resource yet.  I started one, but it is not
complete.  You can record on asterisk as a workaround.

10. System starts working.
> 11. I create some kind of feedback to obtain the knowledge how to improve
> the system.
> 12. I create web application to enable access to obtained data, through
> system of information available everywhere in the internet.
>
> =================== STEPS WHICH I'D LIKE TO FOLLOW, CONNECTED WITH ZANZIBAR
> AND CAIRO ========================
>
> 1. Install Cairo: http://www.speechforge.org/projects/cairo/install.html
> 2. Install Zanzibar: http://www.spokentech.org/openivr/install.html
> 3. Start the Cairo server:
> http://www.speechforge.org/projects/cairo/intro.html . Should I use
> bin/lanuch.sh (if there is sh, not only bat)? Or rather rserver.bat/sh,
> transmitter1.bat/sh, receiver1.bat/sh? I guess I need bin/launch.sh.
>
use rserver, transmitter1 and receiver1

> 4. Start the openIVR Server (Zanzibar server):
> http://www.spokentech.org/openivr/intro.html . Should I use allinone.bat
> (or
> rather sh for Linux, if there is any sh)? Or rather rserver.bat,
> transmitter1.bat, receiver1.bat?
>
if you use allinone, you dont need to run rserver, transmitter and receiver
alternatively if you ran the 3 cairo processes you should run
asteriskConnector.sh too

> 5. Configure Zanzibar server: http://www.spokentech.org/openivr/intro.html.
> You say "no changes should be required if you are running zanzibar and
> cairo
> on the same machine" so I guess no changes are needed. However shouldn't I
> change mySipAddress or port? Don't I need configure the Dialog Service by
> using ApplicationBySipHeaderService?
>
mySipAddress and port could be changed or stay the same,  just need to be
the same as the port and address you use in asterisk when you transfer a
call.  Yes, you probably want to change  from applicationbynumber service to
applicationbysipheaderservice

> 7. Softphone: http://www.speechforge.org/projects/cairo/intro.html("Running
> the examples"). Temporarily configure Zanzibar for Twinkle (instead of
> Xlite
> which I don't know). Later I guess I will need to erase changes in
> configuration, after having access to normal SIP provider account, rather
> than Thinkle.
>
No chanegs should be needed in zanzibar.  The config I show for xlite will
connect directly to zanzibar from the sip phone for easy testing without
requireing asterisk.  You could also configure the sip sphone to call
asterisk first.

> 8. Follow http://www.spokentech.org/openivr/intro.html "Running in
> Asterisk
> mode - Dialplan integration" section.
> 9. Write my own speech application. The main question is here. Should I use
> vxml or Cairo client API for writing my application? Or sepcify it in
> dialplan of Asterisk? Or maybe in runApplication method? Or in java source
> code of Sphinx4 applicaton? Or should I write logic of my app in code based
> on this Parrot Speech Application code from
> http://www.spokentech.org/openivr/writing-speechlets.html ? In
> http://www.spokentech.org/openivr/aik.html I found that I may write logic
> of
> my app either in voicexml or in java apps (mrcp4j api or ciairo-client
> api).
> <u>In other words - where and how to specify my algorithm?</u>
>
All of those options are available to you.  depends on the algorithm you
want to run.  If an asterisk dialplan will work for you, then you dont need
zanziabr and cairo.  If you are familiar with voicexml, maybe that is a good
choice -- it is a standard.  The Parrot application approach is good if you
are most comfortable writing applications in java.

> 10. Follow http://www.spokentech.org/openivr/aik.html .
>
> =================== MINOR QUESTIONS CONNECTED WITH ZANZIBAR AND CAIRO
> ============================================
>
> 1. http://www.speechforge.org/projects/cairo/intro.html -> Running the
> Demo
> MRCPv2 Clients -> Available Clients -> "Each client can be started by
> running the appropriate batch script located in the demo/bin directory of
> your Cairo installation" -> after all: I need to run batch to start demo.
> Should I run batch directly?
> 2. http://www.speechforge.org/projects/cairo/intro.html -> Running the
> Cairo-client Demo -> Can I use only bargein and parrot to write my code?
> Why
> only demo-bargein and demo-parrot are available in the cairo-client? At
> first I though demo-standalone would be more proper starting point for my
> application but it looks like it is not available in the cairo-client.
> 3. http://www.speechforge.org/projects/cairo/dependencies.html -> Project
> Dependencies -> runtime -> WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz -> Can't
> I
> use my own acoustic model?!?!
>
Yes.  You will need to update the sphinx-config.xml

> 4. http://www.speechforge.org/projects/cairo/dependencies.html -> Project
> Transitive Dependencies -> Do I need to install all of these dependencies
> by
> "sudo get-app install" or are they included in Cairo?
>
They are all included.  No additional downloads required other than java and
jmf (and run jsapi.sh)

> 5. http://www.speechforge.org/projects/cairo/dependencies.html ->
> Dependency
> Listings -> Commons Configuration -> "Tools to assist in the reading of
> configuration/preferences files in various formats" -> Would I benefit from
> using it?
>
Probably not.  Its a java lib used internally.  for the config.xml files.

> 6. http://www.speechforge.org/projects/cairo/dependencies.html ->
> Dependency
> Listings -> Codec -> "phonetic encoding utilities" -> I already created my
> list of phonemes for SphinxTrain. Is it somehow connected with it?
>
No connection.

> 7. http://www.speechforge.org/projects/cairo/dependencies.html -> Project
> Dependency Graph -> Dependency Listings -> Unnamed -
> sphinx:WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz:jar:1.0beta -> Why both
> TIdigits and WSJ are used?
>
For junit tests only.

> 8. http://www.spokentech.org/openivr/intro.html -> Running the Examples
> (in
> Demo Mode) -> Prerequisites -> "You will need a sip softphone like Xlite to
> access the demos" -> How to use Twinkle instead? Is the configuration the
> same?
>
Should be no problem.  Config should be logically the same.

> 9. http://www.spokentech.org/openivr/intro.html -> Running the Examples
> (in
> Demo Mode) -> Prerequisites -> "you will require (preferably high quality)
> microphone" -> Why does it require good quality microphone? I thought in
> reality there wouldn't be good quality, but rather poor 8kHz telephone
> speech, not 16kHz which is possible for microphones?
>
yeah, you are right.  I should change that.

> 10. http://www.spokentech.org/openivr/intro.html -> Running the Examples
> (in
> Demo Mode) -> Prerequisites -> What's the difference between parrot and
> vxml-parrot? By the way, I'm more or less familiar with JSGF grammars. But
> wouldn't it be better for my application not to use grammars extensively,
> but rather to create algorithm in source code, and use very simple grammar
> which only contains list of words, like e.g. <utterance>=<list_of_words>?
>
Is there a away to use Sphinx4 with such a simple vocabulary?  Such a simple
"grammar" would be nice for most applications.

> 11. http://www.spokentech.org/openivr/intro.html -> Running in Asterisk
> Mode
> -> Dialplan Integration -> three lines of code for x-channel and
> x-application -> It is to integrate Zanzibar with Asterisk dialplan. Do I
> need to add it after all?
>
Not sure I understand the question.

> 12. http://www.spokentech.org/openivr/intro.html -> Running in Asterisk
> Mode
> -> Dialplan Integration -> a) type=beanId -> I don't get it, b)
> type=className -> Does it mean I can use MyApplication.java source code of
> Sphinx4 here?, c) type=vxml -> Can I accomplish what I need to do with
> vxml?
>
a) it will run the java program specified in the xml file by the beanid
(example Jukebox)
b) Not sure what you mean by "source code of Sphinx4"
c) Maybe.  Not sure what you application is exactly.  I

> 13. http://www.spokentech.org/openivr/intro.html -> Running in Asterisk
> Mode
> -> Asterisk Manager Interface (AMI) Configuration -> I think I don't need
> it. Am I right?
>
I think you are right.

> 14. http://www.spokentech.org/openivr/architecture.html -> I will have
> account from SIP provider with PSTN numer. I see SIP and RTP here. What do
> I
> need to do with this RTP?
>
In VOIP, RTP is used for audio streamining, SIP for signaling

> 15. http://www.spokentech.org/openivr/writing-speechlets.html -> Parrot
> Speech Application -> Why are those three last functions empty?
>
I am using blocking calls, so no need to do anything with callbacks.

> 16. http://www.spokentech.org/openivr/aik.html -> Asterisk Integration Kit
> -> Dialplan Integration -> What do I need to specify in my dialplan? I
> guess
> only speech redirection from Asterisk to Zanzibar. I also guess main part
> of
> my logic would be somewhere else then in dialplan, but I don't know exactly
> where.
>
Yes, if you choose to use Zanziabr/Cairo.  You may get by with just asterisk
and have all logic in the dialplan.  Main question I think you should answer
is "Do you need speech recognition for this app?"   If you are just
collecting audio for model building, maybe not.

>
> Thanks very much for your help in advance!
> Regards!
> --
> View this message in context:
> http://old.nabble.com/using-other-acoustic-models-in-Cairo-Zanzibar-tp26879547p26879547.html
> Sent from the cairo-user mailing list archive at Nabble.com.
>
>
>
> ------------------------------------------------------------------------------
> This SF.Net email is sponsored by the Verizon Developer Community
> Take advantage of Verizon's best-in-class app development support
> A streamlined, 14 day to market process makes app distribution fast and
> easy
> Join now and get one step closer to millions of Verizon customers
> http://p.sf.net/sfu/verizon-dev2dev
> _______________________________________________
> cairo-user mailing list
> cai...@li...
> https://lists.sourceforge.net/lists/listinfo/cairo-user
>