This section will make you familiar with InproTK's speech input component. First, the interface for receiving (or simulating) incremental spoken input will be explained, before we dive into the details of incremental input evaluation and the code that optimizes iSR behaviour.
You will have to use some of the programs that come with InproTK (either from the command-line or from Eclipse), you should look at and try to understand some programming code.
InproTK's entry point to iSR is inpro.apps.SimpleReco
, a command-line application for starting up InproTK using a default or custom configuration. SimpleReco aims to bundle various setup-tasks and variations for speech recognition, like audio source, language modelling, output selection, ...
First, startup SimpleReco without any options, this will give you some usage information. To do this, you can:
inpro.apps.SimpleReco
in Eclipse and select Run as->Java Application, orbin/
-directory of your InproTK installation, and run java inpro.apps.SimpleReco
. You may have to fiddle with your Java-classpath to make this work.Now, let's give SimpleReco something to recognize. There's an audio file in res/DE_1234.wav
which contains some spoken digits (you can guess what digits, but you should rather listen to the file). Can you find out what the command should look like? (Hint: a filesystem URL is not just a path specification, but starts with file:
, see Wikipedia.)
You certainly noticed that there was no incremental output, just a final recognition result. This is because no IU module listened for the incremental output.
-L
for LabelWriter
output, and/or -C
for the current hypothesis viewer (what does it do?)You may have noticed that results are not always correct (with DE_1234.wav
, LabelWriter first recognizes ja, ei, ein, before settling for eins). There are built-in techniques for optimizing incremental results. Otimization trades timeliness of results (that is, when a word is first recognized) against the overall quality of results (how often words that later turn out to be wrong are passed on to a listening module). The best method for this is smoothing, which takes as parameter a smoothing factor which roughly equates to how long a delay is introduced for new results. (Google has a marginally better method, see McGraw & Gruenstein, Interspeech 2012.)
Switch on smoothing with factor 7 by adding the switches -Is 7
. What smothing factor do you need to get rid of any mis-recognitions in the beginning of DE_1234.wav?
For testing, it is often convenient to have a textual interface instead of true ASR, especially because speech recognition makes many errors that make repeatable testing very difficult.
inpro.apps.SimpleText
can be used to supply text as if it were coming from incremental speech recognition, either interactively (default) or with text from a file or the command-line. Do play around with it a little bit. What happens if you use the -L
switch? In particular: Do you have an idea what happens behind the scenes? You may also want to try out the non-interactive modes.
When a new result is generated, InproTK sends the partial hypothesis as a sequence of IUs (compare lecture last Friday, see also inpro.incremental.unit.IU
). InproTK sends both the full list as well as a list of edits since the last update (see inpro.incremental.unit.EditMessage
and EditType
); we call this a dual representation of incremental hypotheses. Depending on the task, the receiving module can use either representation, whichever is more suitable.
To simplify this, each IUModule (see inpro.incremental.IUModule
) contains a RightBuffer
object (see the encapsulated definition in IUModule
) which can be provided with either representation and automatically constructs the missing representation.
IUModules do not contain left buffers (see important concepts below). Instead, input is passed on in the call to leftBufferUpdate(List<IU>, List<EditMessage<IU>>)
.
Thus, the workflow of a typical IU module is to:
Can you identify these four steps in inpro.incremental.processor.Tagger
?? What exactly is the processing step of Tagger? (Bonus: Improve the tagger by integrating the results from Beuck et al.'s NODALIDA 2011 paper).
There is also a more simple form of IU modules, those which do not contain a right buffer. These are defined as inpro.incremental.PushBuffer
, a class that IUModule builds on. Most modules that do not need to produce any output (we call them sinks) are simply PushBuffers.
Take a close look at inpro.incremental.sink.LabelWriter
(this is the module that is triggered by SimpleReco and SimpleText with the -L
switch).
Can you write your own class -- similarly to LabelWriter -- which reacts to a certain combination of IUs (e.g. to eins eins zwei) and performs some action (notify the fire department -- please only simulate this step)?
Alternatively, write an IU sink that gives more detailed output about the data that it receives: At every call, it should print all the IUs and all the edits that it has received. Also, you could follow the IUs' grounded-in links and print any IUs that you find there (do this recursively, or use IU's deepToString()
method).
inpro/apps/config.xml
from ./sphinx-de.xml
to ../../demo/inpro/apps/sphinx-en.xml
. You could also use the -c
configuration option of InproTK to load the configuration in demo/inpro/apps/config-en.xml
which already contains sphinx-en.xml
. In addition you need to define a good default language model, or a grammar to recognize from. Can you set up Sphinx and InproTK for other languages like French?inpro.incremental.source.IUDocument
which is the source of IUs when working with SimpleText.In the theoretical model on which InproTK is built, information is kept as smallest pieces of information: incremental units. As their name indicates, these are the units that are produced and/or consumed by a processing module; their size depends on the type of information they contain: phonemes are smaller, ideas are larger than words.
Incremental units are produced and consumed by incremental modules. Incremental modules are connected to form an acyclic graph (most often just a pipeline). Connections can be set up from code, or are specified in iu-config.xml
where successors of a module are listed as hypChangeListeners
.
In David's abstract general model (AgMo) of incremental processing, IU modules have both a left buffer and a right buffer. In InproTK, they have only a right buffer and the information that would be on the left buffer in AgMo is passed on in the call to leftBufferUpdate(List<IU>, List<EditMessage<IU>>)
.
There are three places for different types of incremental modules in InproTK:
inpro.incremental.processor
contains full-fledged incremental modules such as RMRS parsing and semantics modules, which have IUs as input and IUs of a different type as output,inpro.incremental.source
contains modules that produce IUs out of thin air (well, vibrations of thin air go into the iSR module which turns them into word IUs),inpro.incremental.sink
contains modules that consume IUs but do not generate new ones. These are primarily debugging and logging modules.Incremental units are connected with each other via two types of links:
Quite importantly, InproTK (at the current stage) only partially automates the setting/unsetting/changing of links. In most cases, you (or the module that you implement) should check very carefully whether all links that you set up actually point where they should.
Examining the IU network is not always easy, although inpro.incremental.sink
s like IUNetworkToDOT
and IUNetworkJGraphX
exist (but do not currently work as expected -- your help is greatly appreciated).
InproTK's incremental speech recognition is based on Sphinx-4 which is a (pretty much) state-of-the-art speech recognizer written in Java.