Menu

Tree [55f5e4] master /
 History

HTTPS access


File Date Author Commit
 bin 2017-02-08 robert mundkowsky robert mundkowsky [55f5e4] first draft
 docs 2017-02-08 robert mundkowsky robert mundkowsky [55f5e4] first draft
 etc 2017-02-08 robert mundkowsky robert mundkowsky [55f5e4] first draft
 lib 2017-02-08 robert mundkowsky robert mundkowsky [55f5e4] first draft
 src 2017-02-08 robert mundkowsky robert mundkowsky [55f5e4] first draft
 .classpath 2017-02-08 robert mundkowsky robert mundkowsky [55f5e4] first draft
 .project 2017-02-08 robert mundkowsky robert mundkowsky [55f5e4] first draft
 LICENSE.txt 2017-02-08 robert mundkowsky robert mundkowsky [55f5e4] first draft
 NOTICE.txt 2017-02-08 robert mundkowsky robert mundkowsky [55f5e4] first draft
 README.txt 2017-02-08 robert mundkowsky robert mundkowsky [55f5e4] first draft
 ReleaseNotes-v0.5.txt 2017-02-08 robert mundkowsky robert mundkowsky [55f5e4] first draft
 build.xml 2017-02-08 robert mundkowsky robert mundkowsky [55f5e4] first draft

Read Me

Getting Started
---------------

Download
--------
Speechcloud CLI requires Java Runtime Environment (JRE) 5.0 or higher which can be downloaded here:
http://java.sun.com/javase/downloads/
If you have not already, you will need to set your JAVA_HOME environment variable to point to the 
installed location of your JRE/JDK.

For Linux installations, you need SPEECHCLOUD_HOME environment vaiable to point to the location of your 
speechcloud installation.


Command Line Interface (CLI)
----------------------------
The command line programs are useful for script based recognition and synthesis. They also provide sample code and a
sample project for using the speechcloud-client API.


Running the Command Line Programs
----------------------------------

There are 3 programs:

* Recognizer     -Recognizer accepts an audio file as a parameter, an optional grammar  and returns 
                  the recognition results
* Synthesizer    -Synthesizer accepts a text string and returns an audio stream.  By default it streams 
                  it standard out.  It provide options to direct the stream to a file or to the system speaker.
* MicRecognizer  -MicRecognizer gets the audio stream from the microphone and sends it to the recognizer.  
                  It returns the recognition results.

Command Usage
-------------

 

Recognizer.bat(.sh) [options] <audioFile> <grammar file url>
Options:
   -help "prints help message"
   -service url  "url is location of cloud server"
   -endpoint Do endpoint on the server (default=no)"
   -lm "Use the language model (not the grammar) (defaults to using the grammar)"
   -batch "use batch processing on the server (for CMN stage))"



Synthesizer.bat(.sh) [options] <text string to be synthesized>
Options:
   -help "prints help message"
   -service "location of speechcloud server
   -voice "voice name on the server"*
   -format "Format (wav, mp3 or au) of audio file (defaults to wav)"
   -sampleRate "sample rate in Hertz (defaults to 8000)"
   -endian "Endian (big or little) (defaults to big)"
   -sampleSize "Sample size in bits (defaults to 16)"
   -encoding "Audio encoding (defaults to PCM)"


MicRecognizer.bat(.sh) [options] <grammar file url>
Options:
   -help "prints help message"
   -service "location of the speechcloud server
   -mode "Endpointing Mode (s4 or normal) (defaults to Normal)"
   -stream "Stream Mode (feature or audio) (defaults to Audio)")
   -lm "Use the language model (not the grammar) (defaults to using the grammar)"



*Voices installed on the spokentech server at the time this was written
 hmm-jmk
 jmk-arctic
 hmm-slt
 slt-arctic
 hmm-bdl
 bdl-arctic


The Java API
-------------

To get started with the java api, it is recommended that you download the cli and take a look at 
the cli source code as examples.  You can also use the cli project as a starting point.  It has all 
the required jars and an ant script that you can use as a starting point.

To decode an audio file simply use this code

   //setup the paramters to the recognize call

   String GrammarUrl=file:///
   Boolean Lmflag=false;
   Boolean doEndpointing = false;
   boolean batch = false;
   String audioFileName ="c:/audio/audofile.wav";

   //set up you http recognizer and point it to the speech cloud server
   String Service = http://www.spokentech.com/speechcloud/SpeechUploadServlet"
   HttpRecognizer recog = new HttpRecognizer();
   recog.setService(service);

   //Do the recognition (will block until result is determined)
   RecognitionResult r = recog.recognize(audioFileName, grammarUrl, lmflg, doEndpointing, batch);
   System.out.println("Result: "+r.getText());


If you have a stream, rather than a file you can use the recognize method that takes an AudioInputStream rather File.  Note you
may need to take care to do endpointing depending on the nature of your stream.  The endpointing flag will do endpointing on
the server.  

   AudioInputStream stream = getStreamFromSomewhere(...);


   //Do the recognition (will block until result is determined)
   RecognitionResult r = recog.recognize(stream, grammarUrl, lmflg, doEndpointing, batch);
   System.out.println("Result: "+r.getText());




To decode audio from your microphone and use the client side endpointer, You can use the same 
httprecognizer object, but use the method that takes a endpointing stream.  In this case create a micS4endpointingstream.  
Once setup and initialized it will start the http requests when speech is detected and start streaming the audio as 
an http attachment.  It will close the stream when it detects the end of speech.



    AudioFormat format = new AudioFormat(8000,
    String mimeType

   //create an setup the endpointing stream
   MicS4EndPointingInputStream epStream = new    MicS4EndPointingInputStream(desiredFormat,mimeType);
   epStream.setupStream();

   //setup the http recognizer
   HttpRecognizer recog = new HttpRecognizer();
   recog.setService(service);

   //Start recognition.  Blocks until a result
   RecognitionResult r = null;
   try {	            
       r = recog.recognize(grammarUrl,  epStream,  lmflag,  timeout) ;
   } catch (InstantiationException e) {
       // TODO Auto-generated catch block
      e.printStackTrace();
   } catch (IOException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();      
   }
    	
   //print the results text
   if (r == null) {
       System.out.println("null result");
   } else {
       System.out.println("result: "+r.getText());	 
   }



You can do end pointing on the client for streams and files in a similar manner
use  StreamS4EndPointingInputStream FileS4EndPointingInputStream 
in place of the MicS4EndPointingInputStream in the example above.


This code uses the cloud synthesizer to create audio for a text string/


    //Setup the http synthesizer
    String service = "http://www.spokentech.net/speechcloud/SpeechDownloadServlet";  
	HttpSynthesizer synth = new HttpSynthesizer();
	synth.setService(service);	 
		    		    	
	//create the input parameters
	AudioFormat format = new AudioFormat ( sampleRate, sampleSizeInBits, channels, signed, bigEndian);		      
	String text = "this is a only a test";
	String voice = "jmk-arctic";
	String outFileName = "this-is.wav";
	    	
	//synthesize!
	OutputStream stream = synth.synthesize(text, format, wav, voice);


The HTTP Interface
------------------

There are both a basic HTTP API and a SOAP Web Services API.  The basic HTTP API does a 
post with attachments.  Parameters are passed as fields.  

For the recognizer, the audio and grammars are passed as attachments.  

Audio is returned by the synthesizer in the HTTP response.

HTTP Recognizer Request Fields and attachments.

			HTTP Request Form Fields
			------------------------
Name				Value					Default
---------			------------				--------
lmFlag				True|false				False
continuousFlag			True|false				False
dataMode			Audio|Feature				Audio
doEndpointing			True|false				False
sampleRate			Integer (Hz)				8000
big-Endian			True|false				True
bytesPerValue			Integer					2
Encoding			AudioFormat.Encoding			PCM_SIGNED


				HTTP request File Fields
				------------------------	
Name			Value				Mime Type 			Default
------			-----------			-----------			-----------
Audio			Audio File 			audio/x-wav			None, Required
			to be decoded			audio/s4-audio
							audio/s4-feature
Grammar			JSGF Grammar			Plain/text			Required (in 
											grammar mode)


HTTP Reconizer Response
-----------------------
Mimetype: plain/text
Raw result and semantic tags.
Example:   this is the raw result <TAG:VALUE>



HTTP Synthesizer Request Parameters
-----------------------------------

			HTTP Request Form Fields
			-------------------------
Name				Value					Default
-----				----------------			--------
Text				Plain text to 				Required
				be synthesized			
voice				Voice name on server	
mime				audio/x-wav
 				audio/mpeg
 				audio/x-au	
 				Audio/x-wav
sampleRate			Integer (Hz)				8000
big-Endian			True|false				True
bytesPerValue			Integer					2
Encoding			AudioFormat.Encoding			PCM_SIGNED


HTTP Synthsizer Response
-------------------------
Body:  		Audio File 
Mimetype	audio/x-wav, 
		audio/mpeg, 
		audio/x-au


SOAP API
--------
Uses JAX-WS.  Not complete.



JSGF Grammars and Extracting Semantic Meaning
----------------------------------------------
You can extract semantic meaning from the results by setting up your grammars with tags.  The example grammar 
has a single tag <main> with 3 possible values {WEATHER, SPORTS, STOCKS}.  Given the utterance "Look up sports", 
the result would contain the following string.

look up sports<main:SPORTS>

If you are using the java client api the results will be returned in the RecognitionResult object which has 
convenient methods toget a list of RuleMathes Where a ruleMatch has a getRule and a getTag method <RULE:TAG>.  
So in our example above the list would be of length 1, the rulmatch element would have rule=main and tag=sports.


Grammar Example
---------------

#JSGF V1.0;

/**
 * JSGF Grammar for demo examples
 */
grammar example;
public <main> = [ <pre> ] ( <weather> {WEATHER} | <sports>  {SPORTS} | <stocks> {STOCKS} ) ;
<pre> = ( I would like [ to hear ] ) | ( hear ) | ( [ please ] get [ me ] ) | ( look up );
<weather> = [ the ] weather;
<sports> = sports [ news ];
<stocks> = ( [ a ] stock ( quote | quotes ) ) | stocks;