Pocketsphinx: best way to compare 2 phrase speeches

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Pocketsphinx: best way to compare 2 phrase speeches

Forum: Help

Creator: Eugene Popovich

Created: 2016-03-01

Updated: 2016-03-02

Eugene Popovich - 2016-03-01

Hello

I need to compare 2 audio files or 2 mic output sessions between each other (same voice, accent). They will contain some short phrases and i need to detect whether the second audio or mic session is similar to the first one. I've tried to use dictionaries containing phrase text or use phonem comparison but that doesn't give enough accurracy.

Example:

User say: This is test phrase
Then he say: Something else

compare result should show very low similarity

another example

User say: This is test phrase
Then he say: This is test phrase

compare result should give some high value

Is it possible to achive using pocketsphinx library on Android? Can anybody point me to the right direction?

P.S.: When i try to use phonem recognition it may give me very different phonems every time for same phrase, example:

SIL Z IH S IH S UW B OW D ER S SIL
SIL Z IH S UW S NG P OW Z SIL D ER TH SIL

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-03-01
  
  Is it possible to achive using pocketsphinx library on Android?
  
  No
  
  Can anybody point me to the right direction?
  
  Google for dynamic time warping
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Eugene Popovich - 2016-03-01

Thank You for the answer. I've found your more detailed answer here https://sourceforge.net/p/cmusphinx/discussion/help/thread/d4ca2b80/#8d7f. Is it possible to extract that MFC coefficient in any way using pocketsphinx Android?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-03-01
  
  You'd better use managed java code for that.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Eugene Popovich - 2016-03-02
    
    Thank You for the help. I've performed some tests using DWT, FastDWT and extracted MFC coefficients via sphinx_fe tool. Looks like DWT approach works not bad for single words but for phrase the warp distance may be very big. Maybe there should be a way to split phrase to words and then do comparison.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.