Strategy of implemantation.

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Strategy of implemantation.

Forum: Help

Creator: Cornebidouil

Created: 2017-07-19

Updated: 2017-07-24

Cornebidouil - 2017-07-19

Hi, I would like to try to make a speech recognition of short magic formula, so I would know your opinion on the way of doing it.
In my mind I need to create my own acoustic model from my recordings, with this in mind I have some question :
_ I have a recording file of all my formula but I cut it into records of one to three words (for each formula) so it last below one to two seconds. Do I need to regroup these recordings (to have files above 5 seconds) ?
_ For the testing part I had in mind to use my big recording files to do it, is there a length limit ?

Edit : I have forgotten to add that I use french phonems because my formula are close to latin, so maybe it's better to adapt a french acoustic model ?

Thanks in advance !

Last edit: Cornebidouil 2017-07-20

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-07-23
  
  I would learn about concepts of keyword spotting first and build a test set to estimate the current alarm rate and precision for your phrase.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cornebidouil - 2017-07-24

Thanks for your reply,
I am creating my keyword list so I need to estimate the threshold, how can I distinguish false alarm and missed detections ? (I need to find the limit of unique detection correct ?)
Second question what length of wav file in need to perform this determination ?

Example of my results :

pocketsphinx_continuous.exe -infile model1\fr-FR\allsorts.wav -hmm model1\fr-FR\fr-FR -dict model1\fr-FR\spells2.dic -keyphrase Accio -kws_threshold 1e-20 -time yes -logfn my_log.log -beam 1e-200 Accio Accio 1.510 1.730 0.958290

Does it mean that have just a positive result ? and this result is find at 1.51s and end a 1.73s for a length of 0.958290 ?

Sorry for these beginner questions.

Last edit: Cornebidouil 2017-07-24
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-07-24
  
  Second question what length of wav file in need to perform this determination ?
  
  Tutorial https://cmusphinx.github.io/wiki/tutoriallm/#keyword-lists says
  
  Take a long recording with few occurrences of your keywords and some other sounds. You can take a movie sound or something else. The length of the audio should be approximately 1 hour
  
  Does it mean that have just a positive result ?
  
  Yes
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cornebidouil - 2017-07-24

Thanks, in case of it's words that I have invented I need to record myself during 1 hour ? Even if the objective is to detect the word on a sample of few seconds ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-07-24
  
  Looks like you have trouble to read. I already quoted you above You can take a movie sound or something else.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cornebidouil - 2017-07-24

Holyshit it's late sorry ...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.