Menu

Getting phone posteriors with Kaldi

Help
Horia Cucu
2014-08-11
2014-08-12
  • Horia Cucu

    Horia Cucu - 2014-08-11

    Hi all,

    First of all I must mention that this is my first contact with Kaldi. I have some experience with other speech recognition toolkits (HTK, Sphinx) and used them for small and large vocabulary ASR tasks.

    I didn't install anything and I'm not quite sure where to begin, but my goal for now is to create posterior features for a speech database using Kaldi.

    Can you give me some guidelines on how to begin?

    Thanks,
    Horia

     
    • Daniel Povey

      Daniel Povey - 2014-08-11

      Can you say what you intend to use these posterior features for?
      Dan

      On Mon, Aug 11, 2014 at 10:26 AM, Horia Cucu horiacucu@users.sf.net wrote:

      Hi all,

      First of all I must mention that this is my first contact with Kaldi. I
      have some experience with other speech recognition toolkits (HTK, Sphinx)
      and used them for small and large vocabulary ASR tasks.

      I didn't install anything and I'm not quite sure where to begin, but my
      goal for now is to create posterior features for a speech database using
      Kaldi.

      Can you give me some guidelines on how to begin?

      Thanks,
      Horia


      Getting phone posteriors with Kaldi
      https://sourceforge.net/p/kaldi/discussion/1355348/thread/df992e5a/?limit=25#57fc


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355348/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
      • Horia Cucu

        Horia Cucu - 2014-08-12

        I want to use them for spoken term detection. My experiments trigger two
        scenarios:
        a) Spokent term detection based on phone posterior features
        b) Spokent term detection based on the actual phones (the 1-best hypothesis
        string of phones)

        Horia

        On 11 August 2014 21:38, Daniel Povey danielpovey@users.sf.net wrote:

        Can you say what you intend to use these posterior features for?
        Dan

        On Mon, Aug 11, 2014 at 10:26 AM, Horia Cucu horiacucu@users.sf.net wrote:

        Hi all,

        First of all I must mention that this is my first contact with Kaldi. I
        have some experience with other speech recognition toolkits (HTK, Sphinx)
        and used them for small and large vocabulary ASR tasks.

        I didn't install anything and I'm not quite sure where to begin, but my
        goal for now is to create posterior features for a speech database using
        Kaldi.

        Can you give me some guidelines on how to begin?

        Thanks,
        Horia


        Getting phone posteriors with Kaldi

        https://sourceforge.net/p/kaldi/discussion/1355348/thread/df992e5a/?limit=25#57fc


        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/discussion/1355348/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/


        Getting phone posteriors with Kaldi
        http://sourceforge.net/p/kaldi/discussion/1355348/thread/df992e5a/?limit=25#57fc/962d


        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/discussion/1355348/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/

         
        • Daniel Povey

          Daniel Povey - 2014-08-12

          It would probably be better to generate a lattice, possibly a phone-level
          lattice, and do keyword search on the lattice. We already have
          keyword-search stuff in Kaldi, that was used for the BABEL project (see
          egs/babel/s5b), but the setup is kind of complicated. There is also an
          example script for keyword search in the WSJ example, but I don't know how
          recently it has been tested. I don't know if that WSJ example script
          handles words not in the vocabulary (probably not).
          To generate a phone-level lattice you could either convert a word lattice
          to a phone lattice using lattice-align-phones with
          --replace-output-symbols=true (but this will only contain phone sequences
          that correspond to actual word sequences), or generate a language model at
          the phone level and create a decoding graph from it... the latter approach
          is probably only practical if you have a system without
          word-position-dependent phones (--position-dependent-phones false to
          prepare_lang.sh), and I'm afraid a script doesn't currently exist for it at
          least in the checked-in code, although it should be doable.
          If you really want phone-posterior features, not from a lattice, one way to
          do it is to train a neural net to get the posteriors of context-dependent
          states, evaluate the neural net using nnet-forward or nnet-compute (nnet1
          vs nnet2 setup), convert to pdf-level posteriors using logprob-to-post or
          prob-to-post, then convert to phone-level posteriors using
          post-to-phone-post.
          Guoguo may want to add more regarding the keyword search.
          Dan

          On Tue, Aug 12, 2014 at 7:40 AM, Horia Cucu horiacucu@users.sf.net wrote:

          I want to use them for spoken term detection. My experiments trigger two
          scenarios:
          a) Spokent term detection based on phone posterior features
          b) Spokent term detection based on the actual phones (the 1-best hypothesis
          string of phones)

          Horia

          On 11 August 2014 21:38, Daniel Povey danielpovey@users.sf.net wrote:

          Can you say what you intend to use these posterior features for?
          Dan

          On Mon, Aug 11, 2014 at 10:26 AM, Horia Cucu horiacucu@users.sf.net wrote:

          Hi all,

          First of all I must mention that this is my first contact with Kaldi. I
          have some experience with other speech recognition toolkits (HTK, Sphinx)
          and used them for small and large vocabulary ASR tasks.

          I didn't install anything and I'm not quite sure where to begin, but my
          goal for now is to create posterior features for a speech database using
          Kaldi.

          Can you give me some guidelines on how to begin?

          Thanks,
          Horia


          Getting phone posteriors with Kaldi

          https://sourceforge.net/p/kaldi/discussion/1355348/thread/df992e5a/?limit=25#57fc

          Sent from sourceforge.net because you indicated interest in
          https://sourceforge.net/p/kaldi/discussion/1355348/

          To unsubscribe from further messages, please visit
          https://sourceforge.net/auth/subscriptions/


          Getting phone posteriors with Kaldi

          http://sourceforge.net/p/kaldi/discussion/1355348/thread/df992e5a/?limit=25#57fc/962d


          Sent from sourceforge.net because you indicated interest in
          https://sourceforge.net/p/kaldi/discussion/1355348/

          To unsubscribe from further messages, please visit
          https://sourceforge.net/auth/subscriptions/


          Getting phone posteriors with Kaldi
          http://sourceforge.net/p/kaldi/discussion/1355348/thread/df992e5a/?limit=25#57fc/962d/b939


          Sent from sourceforge.net because you indicated interest in
          https://sourceforge.net/p/kaldi/discussion/1355348/

          To unsubscribe from further messages, please visit
          https://sourceforge.net/auth/subscriptions/

           
MongoDB Logo MongoDB