Menu

Strategy of implemantation.

Help
2017-07-19
2017-07-24
  • Cornebidouil

    Cornebidouil - 2017-07-19

    Hi, I would like to try to make a speech recognition of short magic formula, so I would know your opinion on the way of doing it.
    In my mind I need to create my own acoustic model from my recordings, with this in mind I have some question :
    I have a recording file of all my formula but I cut it into records of one to three words (for each formula) so it last below one to two seconds. Do I need to regroup these recordings (to have files above 5 seconds) ?
    For the testing part I had in mind to use my big recording files to do it, is there a length limit ?

    Edit : I have forgotten to add that I use french phonems because my formula are close to latin, so maybe it's better to adapt a french acoustic model ?

    Thanks in advance !

     

    Last edit: Cornebidouil 2017-07-20
    • Nickolay V. Shmyrev

      I would learn about concepts of keyword spotting first and build a test set to estimate the current alarm rate and precision for your phrase.

       
  • Cornebidouil

    Cornebidouil - 2017-07-24

    Thanks for your reply,
    I am creating my keyword list so I need to estimate the threshold, how can I distinguish false alarm and missed detections ? (I need to find the limit of unique detection correct ?)
    Second question what length of wav file in need to perform this determination ?

    Example of my results :

    pocketsphinx_continuous.exe -infile model1\fr-FR\allsorts.wav -hmm model1\fr-FR\fr-FR -dict  model1\fr-FR\spells2.dic -keyphrase Accio -kws_threshold 1e-20 -time yes -logfn my_log.log -beam 1e-200
    
    Accio
    Accio 1.510 1.730 0.958290
    

    Does it mean that have just a positive result ? and this result is find at 1.51s and end a 1.73s for a length of 0.958290 ?

    Sorry for these beginner questions.

     

    Last edit: Cornebidouil 2017-07-24
    • Nickolay V. Shmyrev

      Second question what length of wav file in need to perform this determination ?

      Tutorial https://cmusphinx.github.io/wiki/tutoriallm/#keyword-lists says

      Take a long recording with few occurrences of your keywords and some other sounds. You can take a movie sound or something else. The length of the audio should be approximately 1 hour

      Does it mean that have just a positive result ?

      Yes

       
  • Cornebidouil

    Cornebidouil - 2017-07-24

    Thanks, in case of it's words that I have invented I need to record myself during 1 hour ? Even if the objective is to detect the word on a sample of few seconds ?

     
    • Nickolay V. Shmyrev

      Looks like you have trouble to read. I already quoted you above You can take a movie sound or something else.

       
  • Cornebidouil

    Cornebidouil - 2017-07-24

    Holyshit it's late sorry ...

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.