Hi, I would like to try to make a speech recognition of short magic formula, so I would know your opinion on the way of doing it.
In my mind I need to create my own acoustic model from my recordings, with this in mind I have some question : I have a recording file of all my formula but I cut it into records of one to three words (for each formula) so it last below one to two seconds. Do I need to regroup these recordings (to have files above 5 seconds) ? For the testing part I had in mind to use my big recording files to do it, is there a length limit ?
Edit : I have forgotten to add that I use french phonems because my formula are close to latin, so maybe it's better to adapt a french acoustic model ?
Thanks in advance !
Last edit: Cornebidouil 2017-07-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your reply,
I am creating my keyword list so I need to estimate the threshold, how can I distinguish false alarm and missed detections ? (I need to find the limit of unique detection correct ?)
Second question what length of wav file in need to perform this determination ?
Take a long recording with few occurrences of your keywords and some other sounds. You can take a movie sound or something else. The length of the audio should be approximately 1 hour
Does it mean that have just a positive result ?
Yes
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks, in case of it's words that I have invented I need to record myself during 1 hour ? Even if the objective is to detect the word on a sample of few seconds ?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I would like to try to make a speech recognition of short magic formula, so I would know your opinion on the way of doing it.
In my mind I need to create my own acoustic model from my recordings, with this in mind I have some question :
I have a recording file of all my formula but I cut it into records of one to three words (for each formula) so it last below one to two seconds. Do I need to regroup these recordings (to have files above 5 seconds) ?
For the testing part I had in mind to use my big recording files to do it, is there a length limit ?
Edit : I have forgotten to add that I use french phonems because my formula are close to latin, so maybe it's better to adapt a french acoustic model ?
Thanks in advance !
Last edit: Cornebidouil 2017-07-20
I would learn about concepts of keyword spotting first and build a test set to estimate the current alarm rate and precision for your phrase.
Thanks for your reply,
I am creating my keyword list so I need to estimate the threshold, how can I distinguish false alarm and missed detections ? (I need to find the limit of unique detection correct ?)
Second question what length of wav file in need to perform this determination ?
Example of my results :
Does it mean that have just a positive result ? and this result is find at 1.51s and end a 1.73s for a length of 0.958290 ?
Sorry for these beginner questions.
Last edit: Cornebidouil 2017-07-24
Tutorial https://cmusphinx.github.io/wiki/tutoriallm/#keyword-lists says
Yes
Thanks, in case of it's words that I have invented I need to record myself during 1 hour ? Even if the objective is to detect the word on a sample of few seconds ?
Looks like you have trouble to read. I already quoted you above You can take a movie sound or something else.
Holyshit it's late sorry ...