CMU Sphinx / Forums / Help: pocketsphinx text alignment issue of inserting optional silence in between words

nayan kalita - 2016-09-26

Hi Everyone,

I am using pocketspinx, ps_alignment to align a text/sentence to wave file and get duration of words and words specral score. with the help function ps_alignment_add_word() added the words to ps alignment.

my problem is : for example the text/sentence to align is How are you.
Someone spoken with a long silence between words ARE and YOU . i.e How are .................. you.

on sphinx3 on force-alignment by default a SIL phone inserted between words ARE and YOU . i.e sphinx 3 output after text alignment

How are SIL you

But on pocket sphinx SIL phone is not inserted. so the alignment is incorrect.

1) How to solve this issue ?
2) If we can add a optional silence inbetween words on text alignment.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-09-26
  
  You can check word times to insert additional silence tags in decoding result.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

nayan kalita - 2016-09-26

HI Nickolay,

Thanks for your reply.

Maybe i was not clear on describing problem statement. below i have given a detail example of the issue.

The sentence/text tried to align is : Winter is cold here

words time interval for the senence on the wave files ( manually mark using wavesurfer)

0.0000000 1.2000000 <s> 1.2125000 1.7900000 WINTER 1.8050000 2.7000000 <sil> 2.7250000 4.2600000 IS 4.2700000 5.0150000 <sil> 5.0375000 6.6400000 COLD 6.6500000 6.9000000 <sil> 6.9075000 7.3500000 HERE 7.3600000 8.2500000 </s>

Sphinx force alignment: , alignment output is

0 1.16 <s> 1.17 1.86 WINTER 1.87 2.7 <sil> 2.71 4.26 IS 4.27 5 <sil> 5.01 6.64 COLD 6.65 6.84 <sil> 6.85 7.35 HERE 7.36 8.25 </s>

Pocket sphinx text alignment: , alignment output is

0 2.72 <s> 2.73 2.87 WINTER 2.80 4.25 IS 4.26 6.57 COLD 6.58 7.36 HERE 7.37 8.21 </s>

As you see in the above example , sphinx3 force-alignment insert <sil> when there is a silence and alignement accuracy of words are good.

But in case of pocketsphinx, word alignment are bad, word are aligned with incorrect parts of the wave files as silence phone is not inserted, words are align to silence parts of the wave files.

My question is :

On FSG we can have optional silence, before and after a words. decoder insert SIL only when silence parts present in the audio. also same is avillable on sphinx3 force-alignment

Is same thing , Could be implemented using ps alignment ?

Last edit: Nickolay V. Shmyrev 2016-09-26
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-09-26
  
  Alignment is designed if you need phone times. If you need just need word times, you can build an FSG and recognize with FSG grammar, it will insert optional silence.
  
  Last edit: Nickolay V. Shmyrev 2016-09-26
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

nayan kalita - 2016-09-27

Hi Nickolay,

Thanks for the reply.

I also need the Phone times and phone spectral score in addtion to word times and word specral score.

1) FSG bulid at word level does not give the phone times and phone score.

Is it is good idea to first run the audio with FSG recongizer and get recoginize text with optional silence insertion. then give the recognize text as input to text alignment.

for the example audio , On FSG recognizer , we get SIL WINTER SIL IS SIL COLD SIL HERE SIL .

so, input text to txet alignment (ps_alignment method ) will be WINTER SIL IS SIL COLD SIL HERE

2) Is there are any other alternative procedure that, i can follow to get both word and phone times and spectral score.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-09-27
  
  I also need the Phone times and phone spectral score in addtion to word times and word specral score.
  
  You should have mention that in the original question.
  
  Is it is good idea to first run the audio with FSG recongizer and get recoginize text with optional silence insertion. then give the recognize text as input to text alignment.
  
  Yes you can do that.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

nayan kalita - 2016-09-28

Hi Nickolay,

Thanks for the reply.

Sorry, i missed to point out that i need phone times also.

I will follow the steps as first fsg and then text alignment.

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- James Salsman - 2017-01-21
  
  Kalita, I think this is because by default pocketsphinx adds invisible optional silence to every FSG state. See lines 234-236 and 90-111 of http://cmusphinx.sourceforge.net/doc/pocketsphinx/fsg__search_8c_source.html
  
  Try your FSGs with explicit silence with the '-fsgusefiller no' switch and see if that works, please?
  
  Can you get time alignments and acoustic scores using '-backtrace yes' ? I am just getting back into PocketSphinx after giving up on it in 2010 for reasons that I think have been addressed.
  
  Last edit: James Salsman 2017-01-21
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

pocketsphinx text alignment issue of inserting optional silence in between words

Speech Recognition Toolkit

Forums

Help

pocketsphinx text alignment issue of inserting optional silence in between words document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

pocketsphinx text alignment issue of inserting optional silence in between words