Read Me
Getting Started
---------------
Download
--------
Speechcloud CLI requires Java Runtime Environment (JRE) 5.0 or higher which can be downloaded here:
http://java.sun.com/javase/downloads/
If you have not already, you will need to set your JAVA_HOME environment variable to point to the
installed location of your JRE/JDK.
For Linux installations, you need SPEECHCLOUD_HOME environment vaiable to point to the location of your
speechcloud installation.
Command Line Interface (CLI)
----------------------------
The command line programs are useful for script based recognition and synthesis. They also provide sample code and a
sample project for using the speechcloud-client API.
Running the Command Line Programs
----------------------------------
There are 3 programs:
* Recognizer -Recognizer accepts an audio file as a parameter, an optional grammar and returns
the recognition results
* Synthesizer -Synthesizer accepts a text string and returns an audio stream. By default it streams
it standard out. It provide options to direct the stream to a file or to the system speaker.
* MicRecognizer -MicRecognizer gets the audio stream from the microphone and sends it to the recognizer.
It returns the recognition results.
Command Usage
-------------
Recognizer.bat(.sh) [options] <audioFile> <grammar file url>
Options:
-help "prints help message"
-service url "url is location of cloud server"
-endpoint Do endpoint on the server (default=no)"
-lm "Use the language model (not the grammar) (defaults to using the grammar)"
-batch "use batch processing on the server (for CMN stage))"
Synthesizer.bat(.sh) [options] <text string to be synthesized>
Options:
-help "prints help message"
-service "location of speechcloud server
-voice "voice name on the server"*
-format "Format (wav, mp3 or au) of audio file (defaults to wav)"
-sampleRate "sample rate in Hertz (defaults to 8000)"
-endian "Endian (big or little) (defaults to big)"
-sampleSize "Sample size in bits (defaults to 16)"
-encoding "Audio encoding (defaults to PCM)"
MicRecognizer.bat(.sh) [options] <grammar file url>
Options:
-help "prints help message"
-service "location of the speechcloud server
-mode "Endpointing Mode (s4 or normal) (defaults to Normal)"
-stream "Stream Mode (feature or audio) (defaults to Audio)")
-lm "Use the language model (not the grammar) (defaults to using the grammar)"
*Voices installed on the spokentech server at the time this was written
hmm-jmk
jmk-arctic
hmm-slt
slt-arctic
hmm-bdl
bdl-arctic
The Java API
-------------
To get started with the java api, it is recommended that you download the cli and take a look at
the cli source code as examples. You can also use the cli project as a starting point. It has all
the required jars and an ant script that you can use as a starting point.
To decode an audio file simply use this code
//setup the paramters to the recognize call
String GrammarUrl=file:///
Boolean Lmflag=false;
Boolean doEndpointing = false;
boolean batch = false;
String audioFileName ="c:/audio/audofile.wav";
//set up you http recognizer and point it to the speech cloud server
String Service = http://www.spokentech.com/speechcloud/SpeechUploadServlet"
HttpRecognizer recog = new HttpRecognizer();
recog.setService(service);
//Do the recognition (will block until result is determined)
RecognitionResult r = recog.recognize(audioFileName, grammarUrl, lmflg, doEndpointing, batch);
System.out.println("Result: "+r.getText());
If you have a stream, rather than a file you can use the recognize method that takes an AudioInputStream rather File. Note you
may need to take care to do endpointing depending on the nature of your stream. The endpointing flag will do endpointing on
the server.
AudioInputStream stream = getStreamFromSomewhere(...);
//Do the recognition (will block until result is determined)
RecognitionResult r = recog.recognize(stream, grammarUrl, lmflg, doEndpointing, batch);
System.out.println("Result: "+r.getText());
To decode audio from your microphone and use the client side endpointer, You can use the same
httprecognizer object, but use the method that takes a endpointing stream. In this case create a micS4endpointingstream.
Once setup and initialized it will start the http requests when speech is detected and start streaming the audio as
an http attachment. It will close the stream when it detects the end of speech.
AudioFormat format = new AudioFormat(8000,
String mimeType
//create an setup the endpointing stream
MicS4EndPointingInputStream epStream = new MicS4EndPointingInputStream(desiredFormat,mimeType);
epStream.setupStream();
//setup the http recognizer
HttpRecognizer recog = new HttpRecognizer();
recog.setService(service);
//Start recognition. Blocks until a result
RecognitionResult r = null;
try {
r = recog.recognize(grammarUrl, epStream, lmflag, timeout) ;
} catch (InstantiationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
//print the results text
if (r == null) {
System.out.println("null result");
} else {
System.out.println("result: "+r.getText());
}
You can do end pointing on the client for streams and files in a similar manner
use StreamS4EndPointingInputStream FileS4EndPointingInputStream
in place of the MicS4EndPointingInputStream in the example above.
This code uses the cloud synthesizer to create audio for a text string/
//Setup the http synthesizer
String service = "http://www.spokentech.net/speechcloud/SpeechDownloadServlet";
HttpSynthesizer synth = new HttpSynthesizer();
synth.setService(service);
//create the input parameters
AudioFormat format = new AudioFormat ( sampleRate, sampleSizeInBits, channels, signed, bigEndian);
String text = "this is a only a test";
String voice = "jmk-arctic";
String outFileName = "this-is.wav";
//synthesize!
OutputStream stream = synth.synthesize(text, format, wav, voice);
The HTTP Interface
------------------
There are both a basic HTTP API and a SOAP Web Services API. The basic HTTP API does a
post with attachments. Parameters are passed as fields.
For the recognizer, the audio and grammars are passed as attachments.
Audio is returned by the synthesizer in the HTTP response.
HTTP Recognizer Request Fields and attachments.
HTTP Request Form Fields
------------------------
Name Value Default
--------- ------------ --------
lmFlag True|false False
continuousFlag True|false False
dataMode Audio|Feature Audio
doEndpointing True|false False
sampleRate Integer (Hz) 8000
big-Endian True|false True
bytesPerValue Integer 2
Encoding AudioFormat.Encoding PCM_SIGNED
HTTP request File Fields
------------------------
Name Value Mime Type Default
------ ----------- ----------- -----------
Audio Audio File audio/x-wav None, Required
to be decoded audio/s4-audio
audio/s4-feature
Grammar JSGF Grammar Plain/text Required (in
grammar mode)
HTTP Reconizer Response
-----------------------
Mimetype: plain/text
Raw result and semantic tags.
Example: this is the raw result <TAG:VALUE>
HTTP Synthesizer Request Parameters
-----------------------------------
HTTP Request Form Fields
-------------------------
Name Value Default
----- ---------------- --------
Text Plain text to Required
be synthesized
voice Voice name on server
mime audio/x-wav
audio/mpeg
audio/x-au
Audio/x-wav
sampleRate Integer (Hz) 8000
big-Endian True|false True
bytesPerValue Integer 2
Encoding AudioFormat.Encoding PCM_SIGNED
HTTP Synthsizer Response
-------------------------
Body: Audio File
Mimetype audio/x-wav,
audio/mpeg,
audio/x-au
SOAP API
--------
Uses JAX-WS. Not complete.
JSGF Grammars and Extracting Semantic Meaning
----------------------------------------------
You can extract semantic meaning from the results by setting up your grammars with tags. The example grammar
has a single tag <main> with 3 possible values {WEATHER, SPORTS, STOCKS}. Given the utterance "Look up sports",
the result would contain the following string.
look up sports<main:SPORTS>
If you are using the java client api the results will be returned in the RecognitionResult object which has
convenient methods toget a list of RuleMathes Where a ruleMatch has a getRule and a getTag method <RULE:TAG>.
So in our example above the list would be of length 1, the rulmatch element would have rule=main and tag=sports.
Grammar Example
---------------
#JSGF V1.0;
/**
* JSGF Grammar for demo examples
*/
grammar example;
public <main> = [ <pre> ] ( <weather> {WEATHER} | <sports> {SPORTS} | <stocks> {STOCKS} ) ;
<pre> = ( I would like [ to hear ] ) | ( hear ) | ( [ please ] get [ me ] ) | ( look up );
<weather> = [ the ] weather;
<sports> = sports [ news ];
<stocks> = ( [ a ] stock ( quote | quotes ) ) | stocks;