[Cairo-user] using other acoustic models in Cairo/Zanzibar

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hello!

=================== GENERAL INFORMATION ABOUT WHAT I'D LIKE TO DO
=============================

I'd like to create such an application that:
1. User calls special number from mobile phone.
2. It connects him/her to server. (The user chooses if he/she wants to use
DTMF or ASR but now let's focus only on ASR).
3. He/she speaks twelve digits, then say "details", speak information about
the details, says "next" and repeats the same, until he/she says "finish".
>From time to time server may say something, not based on TTS, but based on
prerecorded mp3/wav files. (The server has to check control sum of those
twelve digits every time according to some kind of simple mathematical
algorithm and inform the user if the number was correct or not)!
4. The server saves results of recognition in the database.

=================== WHAT I ALREADY HAVE DONE
==================================================

I created acoustic model for my language in SphinxTrain. My language is not
supported in VoxForge (which has got about eight languages, including
English, Spanish, Russian, French and so on). So I have to use my acoustic
model for speech recognition.

I also have got .java source code file with . 

=================== WHAT I'D LIKE YOU TO HELP ME WITH
=========================================

I'd like you to tell me where and how I should specify my algorithm.

The question "where" is about ninth step of "Steps which I'd like to follow,
connected with Zanzibar and Cairo" section.

The question "how" is also connected with formal grammars because I think it
may be much easier to do everything based on very simple grammar, where
there is only one group of words, which simply contains all the words and
have the main algorithm in source code rather than in formal grammar.

I also would be greatful for help with integrating Sphinx with Asterisk. My
doubts about integrating are shown in "Steps which I'd like to follow,
connected with Zanzibar and Cairo" section.

What I'd like to is not the answer to those minor questions but help with
following the fifth step. I indicated important issues with bold font. I
gave those minor question to show you what is or is not clear for me after
reading on-line documentation of Cairo and Zanzibar.

=================== GENERALS STEPS WHICH I'D LIKE TO FOLLOW
===================================

(It is just general idea, I'd like you to help me with fifth point). I
thought about doing the following:
1. Install Twinkle in order to emulate calling from mobile phone to server.
2. On the same computer I install Asterisk.
3. I configure SIP trunk for Asterisk so that it would be able to receive
calls from Twinkle. How should this configure file look like for Twinkle?
4. I install Sphinx4 (I already have got Sphinx4 on my computer). I finish
creating application for Sphinx4, which uses my acoustic model, my grammar
and my algorithm.
5. I use Zanzibar and Cairo to integrate Sphinx4 with Asterisk. (Look at the
next section). This step looks like the most difficult for me. Especially
because I'm not quite sure where and how to specify my algorithm.
6. I create additional dialplan so that: a) the system would ask the user
with the use of DTMF if he/she wants to use ASR or DTMF in main session. If
he/she chooses ASR, the further communication would be with Sphinx4. If
he/she chooses DTMF, everything would be based on DTMF. How can I do such a
thing? Previously I thought that those would be two different, independent
systems (midlet on mobile phone, using httpconnection, post method and
Tomcat on server). Now I guess perhaps using DTMF in Asterisk may be better
solution.
7. I test my application to ensure myself that the whole system works fine.
8. I buy account from SIP provider with PSTN number. I install Asterisk and
Sphinx4 on server (previously it was tested on my computer). Possible
problems: too slow internet connection. Solution: I will have to use Digium
card or VoIP instead of SIP, or rent server in Data Centre. Other possible
problem: Windows instead of Linux. I guess using Linux for Asterisk is
better idea, however I hope there shouldn't be big difference between
followin all of mentioned steps in Linux and doing the same in Windows.
9. I record samples of destination users' voices. There is no good (several
or hundreads hours of recordings) acoustic model for free, available for
Polish language, the only what is left is speaker-dependent system. I create
new acoustic model, taking into account new speech samples.
10. System starts working.
11. I create some kind of feedback to obtain the knowledge how to improve
the system.
12. I create web application to enable access to obtained data, through
system of information available everywhere in the internet.

=================== STEPS WHICH I'D LIKE TO FOLLOW, CONNECTED WITH ZANZIBAR
AND CAIRO ========================

1. Install Cairo: http://www.speechforge.org/projects/cairo/install.html
2. Install Zanzibar: http://www.spokentech.org/openivr/install.html
3. Start the Cairo server:
http://www.speechforge.org/projects/cairo/intro.html . Should I use
bin/lanuch.sh (if there is sh, not only bat)? Or rather rserver.bat/sh,
transmitter1.bat/sh, receiver1.bat/sh? I guess I need bin/launch.sh.
4. Start the openIVR Server (Zanzibar server):
http://www.spokentech.org/openivr/intro.html . Should I use allinone.bat (or
rather sh for Linux, if there is any sh)? Or rather rserver.bat,
transmitter1.bat, receiver1.bat?
5. Configure Zanzibar server: http://www.spokentech.org/openivr/intro.html .
You say "no changes should be required if you are running zanzibar and cairo
on the same machine" so I guess no changes are needed. However shouldn't I
change mySipAddress or port? Don't I need configure the Dialog Service by
using ApplicationBySipHeaderService?
7. Softphone: http://www.speechforge.org/projects/cairo/intro.html ("Running
the examples"). Temporarily configure Zanzibar for Twinkle (instead of Xlite
which I don't know). Later I guess I will need to erase changes in
configuration, after having access to normal SIP provider account, rather
than Thinkle.
8. Follow http://www.spokentech.org/openivr/intro.html "Running in Asterisk
mode - Dialplan integration" section.
9. Write my own speech application. The main question is here. Should I use
vxml or Cairo client API for writing my application? Or sepcify it in
dialplan of Asterisk? Or maybe in runApplication method? Or in java source
code of Sphinx4 applicaton? Or should I write logic of my app in code based
on this Parrot Speech Application code from
http://www.spokentech.org/openivr/writing-speechlets.html ? In
http://www.spokentech.org/openivr/aik.html I found that I may write logic of
my app either in voicexml or in java apps (mrcp4j api or ciairo-client api).
<u>In other words - where and how to specify my algorithm?</u>
10. Follow http://www.spokentech.org/openivr/aik.html .

=================== MINOR QUESTIONS CONNECTED WITH ZANZIBAR AND CAIRO
============================================

1. http://www.speechforge.org/projects/cairo/intro.html -> Running the Demo
MRCPv2 Clients -> Available Clients -> "Each client can be started by
running the appropriate batch script located in the demo/bin directory of
your Cairo installation" -> after all: I need to run batch to start demo.
Should I run batch directly?
2. http://www.speechforge.org/projects/cairo/intro.html -> Running the
Cairo-client Demo -> Can I use only bargein and parrot to write my code? Why
only demo-bargein and demo-parrot are available in the cairo-client? At
first I though demo-standalone would be more proper starting point for my
application but it looks like it is not available in the cairo-client.
3. http://www.speechforge.org/projects/cairo/dependencies.html -> Project
Dependencies -> runtime -> WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz -> Can't I
use my own acoustic model?!?!
4. http://www.speechforge.org/projects/cairo/dependencies.html -> Project
Transitive Dependencies -> Do I need to install all of these dependencies by
"sudo get-app install" or are they included in Cairo?
5. http://www.speechforge.org/projects/cairo/dependencies.html -> Dependency
Listings -> Commons Configuration -> "Tools to assist in the reading of
configuration/preferences files in various formats" -> Would I benefit from
using it?
6. http://www.speechforge.org/projects/cairo/dependencies.html -> Dependency
Listings -> Codec -> "phonetic encoding utilities" -> I already created my
list of phonemes for SphinxTrain. Is it somehow connected with it?
7. http://www.speechforge.org/projects/cairo/dependencies.html -> Project
Dependency Graph -> Dependency Listings -> Unnamed -
sphinx:WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz:jar:1.0beta -> Why both
TIdigits and WSJ are used?
8. http://www.spokentech.org/openivr/intro.html -> Running the Examples (in
Demo Mode) -> Prerequisites -> "You will need a sip softphone like Xlite to
access the demos" -> How to use Twinkle instead? Is the configuration the
same?
9. http://www.spokentech.org/openivr/intro.html -> Running the Examples (in
Demo Mode) -> Prerequisites -> "you will require (preferably high quality)
microphone" -> Why does it require good quality microphone? I thought in
reality there wouldn't be good quality, but rather poor 8kHz telephone
speech, not 16kHz which is possible for microphones?
10. http://www.spokentech.org/openivr/intro.html -> Running the Examples (in
Demo Mode) -> Prerequisites -> What's the difference between parrot and
vxml-parrot? By the way, I'm more or less familiar with JSGF grammars. But
wouldn't it be better for my application not to use grammars extensively,
but rather to create algorithm in source code, and use very simple grammar
which only contains list of words, like e.g. <utterance>=<list_of_words>?
11. http://www.spokentech.org/openivr/intro.html -> Running in Asterisk Mode
-> Dialplan Integration -> three lines of code for x-channel and
x-application -> It is to integrate Zanzibar with Asterisk dialplan. Do I
need to add it after all?
12. http://www.spokentech.org/openivr/intro.html -> Running in Asterisk Mode
-> Dialplan Integration -> a) type=beanId -> I don't get it, b)
type=className -> Does it mean I can use MyApplication.java source code of
Sphinx4 here?, c) type=vxml -> Can I accomplish what I need to do with vxml?
13. http://www.spokentech.org/openivr/intro.html -> Running in Asterisk Mode
-> Asterisk Manager Interface (AMI) Configuration -> I think I don't need
it. Am I right?
14. http://www.spokentech.org/openivr/architecture.html -> I will have
account from SIP provider with PSTN numer. I see SIP and RTP here. What do I
need to do with this RTP?
15. http://www.spokentech.org/openivr/writing-speechlets.html -> Parrot
Speech Application -> Why are those three last functions empty?
16. http://www.spokentech.org/openivr/aik.html -> Asterisk Integration Kit
-> Dialplan Integration -> What do I need to specify in my dialplan? I guess
only speech redirection from Asterisk to Zanzibar. I also guess main part of
my logic would be somewhere else then in dialplan, but I don't know exactly
where.

Thanks very much for your help in advance!
Regards!
-- 
View this message in context: http://old.nabble.com/using-other-acoustic-models-in-Cairo-Zanzibar-tp26879547p26879547.html
Sent from the cairo-user mailing list archive at Nabble.com.