Kaldi / Discussion / Developers: Streaming online decoding

Streaming online decoding

Forum: Developers

Creator: Yoav Lasovsky

Created: 2015-03-09

Updated: 2015-03-19

Yoav Lasovsky - 2015-03-09

Hi Dan,

The sample script you have for online dnn decoding is for single utterance decoding, it does not allow for long continuous audio stream to be decoded.
I saw that the API of the online decoder supports long streaming audio by enabling calls to finalize decoding and init decoding between utterances that are part of a continuous stream.

My question is did you not implement such a sample because a lack of time or because the API is not tested ?
Is there something I should know before implementing it myself?

Thanks!

Yoav Lasovsky

Last edit: Yoav Lasovsky 2015-03-09

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Povey - 2015-03-09
  
  Hi Dan,
  
  The sample script you have for online dnn decoding is for single utterance
  decoding, it does not allow for long continuous audio stream to be encoded.
  I saw that the API of the online decoder supports long streaming audio by
  enabling calls to finalize decoding and init decoding between utterances
  that are part of a continuous stream.
  
  This was intended for multiple utterances of the same speaker, but would
  work for the scenario you mention also.
  
  My question is did you not implement such a sample because a lack of time
  or because the API is not tested ?
  Is there something I should know before implementing it myself?
  
  The need never really came up. If you're talking about decoders like
  online2-wav-nnet2-latgen-threaded or -faster, what you say could certainly
  be done. For instance, you could use the existing endpointing code that's
  there; and when an endpoint is detected, you could output a lattice and
  then re-start decoding the same wav file from the point where you were.
  Incidentally, I have some changes to that (-threaded) decoder that I intend
  to commit soon, mostly in the internal code (not the main()), but they
  won't affect what you are doing. It's to enable down-weighting of silence
  in the iVector extraction (we found this was important in highly mismatched
  conditions), and it changes the number of threads from 3 to 2 for
  simplicity.
  
  Dan
  
  Thanks!
  
  Yoav Lasovsky
  
  Streaming online decoding
  https://sourceforge.net/p/kaldi/discussion/1355349/thread/67be1603/?limit=25#9287
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/kaldi/discussion/1355349/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Tanel Alumäe - 2015-03-19
  
  Yoav,
  
  Take a look at https://github.com/alumae/kaldi-gstreamer-server. It can do decoding on continuous stream using online DNN models, outputs partial and final hypotheses via web-based API, does endpointing.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Streaming online decoding

Forums

Help

Streaming online decoding

Yoav Lasovsky

Streaming online decoding

Forums

Help

Streaming online decoding document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Yoav Lasovsky

Streaming online decoding