Menu

Word boundary detection.

Help
2008-03-27
2012-09-22
  • Sergei Snegov

    Sergei Snegov - 2008-03-27

    Hi,
    I have a wav file of an audio book and I have its text.
    I need to find location of each word in the audio file, that is where each word starts and ends, to split the wav file into many small wav files containing one word each. This is not a speech recognition task per se (because I know the text in advance) but it looks related.
    How can I use SPHINX to do this?
    Thanks a lot in advance.
    Any help is greatly appreciated.

     
    • Nickolay V. Shmyrev

      With sphinx3_align you can align text to transcripts and get word boundaries.

       
    • Sergei Snegov

      Sergei Snegov - 2008-03-28

      Thanks a lot Nickolay.
      I am so glad that you have this program.

      Now I have a second problem:
      I have found a lot of web pages where they mention that they use sphinx3_align, forced-alignment or viterbi-alignment. But no mention of how to install or use it. Could you please give me a hint where to start?

       
      • Nickolay V. Shmyrev

        > But no mention of how to install or use it.

        It's in sphinx3 package

        https://sourceforge.net/project/showfiles.php?group_id=1904&package_id=68406

        you will also need acoustic model for your language and a dictionary

        > Could you please give me a hint where to start?

        Start with tests and demos from the package.

         
    • Sergei Snegov

      Sergei Snegov - 2008-03-29

      Thanks a lot again Nickolay,
      Using your hints I have found sphinx3/src/tests/regression/test-align.sh/test-align.sh file.
      And also sphinx3/model/lm/an4/pittsburgh.littleendian.raw which looks like an input audio file for the test-align.sh script.
      The problem is that I can't find an output file that would contain word boundary information. Should I use some additional command line argument or change the script somehow to enable it?

       
      • Nickolay V. Shmyrev

        It creates the file pittsburgh.littleendian.wdseg with word segments. Time is measured in frames (0.01 s).

         
    • Sergei Snegov

      Sergei Snegov - 2008-03-31

      Thanks a lot Nickolay.
      That's exactly what I was looking for.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.