Menu

pocketsphinx text alignment issue of inserting optional silence in between words

Help
2016-09-26
2017-01-22
  • nayan kalita

    nayan kalita - 2016-09-26

    Hi Everyone,

    I am using pocketspinx, ps_alignment to align a text/sentence to wave file and get duration of words and words specral score. with the help function ps_alignment_add_word() added the words to ps alignment.

    my problem is : for example the text/sentence to align is How are you.
    Someone spoken with a long silence between words ARE and YOU . i.e How are .................. you.

    on sphinx3 on force-alignment by default a SIL phone inserted between words ARE and YOU . i.e sphinx 3 output after text alignment

    How are SIL you

    But on pocket sphinx SIL phone is not inserted. so the alignment is incorrect.

    1) How to solve this issue ?
    2) If we can add a optional silence inbetween words on text alignment.

     
    • Nickolay V. Shmyrev

      You can check word times to insert additional silence tags in decoding result.

       
  • nayan kalita

    nayan kalita - 2016-09-26

    HI Nickolay,

    Thanks for your reply.

    Maybe i was not clear on describing problem statement. below i have given a detail example of the issue.

    The sentence/text tried to align is : Winter is cold here

    words time interval for the senence on the wave files ( manually mark using wavesurfer)

    0.0000000 1.2000000 <s>
    1.2125000 1.7900000 WINTER
    1.8050000 2.7000000 <sil>
    2.7250000 4.2600000 IS
    4.2700000 5.0150000 <sil>
    5.0375000 6.6400000 COLD
    6.6500000 6.9000000 <sil>
    6.9075000 7.3500000 HERE
    7.3600000 8.2500000 </s>
    

    Sphinx force alignment: , alignment output is

    0 1.16 <s>
    1.17 1.86 WINTER
    1.87 2.7 <sil>
    2.71 4.26 IS
    4.27 5 <sil>
    5.01 6.64 COLD
    6.65 6.84 <sil>
    6.85 7.35 HERE
    7.36 8.25 </s>
    

    Pocket sphinx text alignment: , alignment output is

    0  2.72 <s>
    2.73 2.87 WINTER
    2.80 4.25 IS
    4.26 6.57 COLD
    6.58 7.36 HERE
    7.37 8.21 </s>
    

    As you see in the above example , sphinx3 force-alignment insert <sil> when there is a silence and alignement accuracy of words are good.

    But in case of pocketsphinx, word alignment are bad, word are aligned with incorrect parts of the wave files as silence phone is not inserted, words are align to silence parts of the wave files.

    My question is :

    On FSG we can have optional silence, before and after a words. decoder insert SIL only when silence parts present in the audio. also same is avillable on sphinx3 force-alignment

    Is same thing , Could be implemented using ps alignment ?

     

    Last edit: Nickolay V. Shmyrev 2016-09-26
    • Nickolay V. Shmyrev

      Alignment is designed if you need phone times. If you need just need word times, you can build an FSG and recognize with FSG grammar, it will insert optional silence.

       

      Last edit: Nickolay V. Shmyrev 2016-09-26
  • nayan kalita

    nayan kalita - 2016-09-27

    Hi Nickolay,

    Thanks for the reply.

    I also need the Phone times and phone spectral score in addtion to word times and word specral score.

    1) FSG bulid at word level does not give the phone times and phone score.

    Is it is good idea to first run the audio with FSG recongizer and get recoginize text with optional silence insertion. then give the recognize text as input to text alignment.

    for the example audio , On FSG recognizer , we get SIL WINTER SIL IS SIL COLD SIL HERE SIL .

    so, input text to txet alignment (ps_alignment method ) will be WINTER SIL IS SIL COLD SIL HERE

    2) Is there are any other alternative procedure that, i can follow to get both word and phone times and spectral score.

     
    • Nickolay V. Shmyrev

      I also need the Phone times and phone spectral score in addtion to word times and word specral score.

      You should have mention that in the original question.

      Is it is good idea to first run the audio with FSG recongizer and get recoginize text with optional silence insertion. then give the recognize text as input to text alignment.

      Yes you can do that.

       
  • nayan kalita

    nayan kalita - 2016-09-28

    Hi Nickolay,

    Thanks for the reply.

    Sorry, i missed to point out that i need phone times also.

    I will follow the steps as first fsg and then text alignment.

    Thanks.

     
    • James Salsman

      James Salsman - 2017-01-21

      Kalita, I think this is because by default pocketsphinx adds invisible optional silence to every FSG state. See lines 234-236 and 90-111 of http://cmusphinx.sourceforge.net/doc/pocketsphinx/fsg__search_8c_source.html

      Try your FSGs with explicit silence with the '-fsgusefiller no' switch and see if that works, please?

      Can you get time alignments and acoustic scores using '-backtrace yes' ? I am just getting back into PocketSphinx after giving up on it in 2010 for reasons that I think have been addressed.

       

      Last edit: James Salsman 2017-01-21

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.