Menu

Streaming online decoding

Developers
2015-03-09
2015-03-19
  • Yoav Lasovsky

    Yoav Lasovsky - 2015-03-09

    Hi Dan,

    The sample script you have for online dnn decoding is for single utterance decoding, it does not allow for long continuous audio stream to be decoded.
    I saw that the API of the online decoder supports long streaming audio by enabling calls to finalize decoding and init decoding between utterances that are part of a continuous stream.

    My question is did you not implement such a sample because a lack of time or because the API is not tested ?
    Is there something I should know before implementing it myself?

    Thanks!

    Yoav Lasovsky

     

    Last edit: Yoav Lasovsky 2015-03-09
    • Daniel Povey

      Daniel Povey - 2015-03-09

      Hi Dan,

      The sample script you have for online dnn decoding is for single utterance
      decoding, it does not allow for long continuous audio stream to be encoded.
      I saw that the API of the online decoder supports long streaming audio by
      enabling calls to finalize decoding and init decoding between utterances
      that are part of a continuous stream.

      This was intended for multiple utterances of the same speaker, but would
      work for the scenario you mention also.

      My question is did you not implement such a sample because a lack of time
      or because the API is not tested ?
      Is there something I should know before implementing it myself?

      The need never really came up. If you're talking about decoders like
      online2-wav-nnet2-latgen-threaded or -faster, what you say could certainly
      be done. For instance, you could use the existing endpointing code that's
      there; and when an endpoint is detected, you could output a lattice and
      then re-start decoding the same wav file from the point where you were.
      Incidentally, I have some changes to that (-threaded) decoder that I intend
      to commit soon, mostly in the internal code (not the main()), but they
      won't affect what you are doing. It's to enable down-weighting of silence
      in the iVector extraction (we found this was important in highly mismatched
      conditions), and it changes the number of threads from 3 to 2 for
      simplicity.

      Dan

      Thanks!

      Yoav Lasovsky

      Streaming online decoding
      https://sourceforge.net/p/kaldi/discussion/1355349/thread/67be1603/?limit=25#9287


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355349/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
    • Tanel Alumäe

      Tanel Alumäe - 2015-03-19

      Yoav,

      Take a look at https://github.com/alumae/kaldi-gstreamer-server. It can do decoding on continuous stream using online DNN models, outputs partial and final hypotheses via web-based API, does endpointing.