Hi!
I have 3 hours .wav file, with a lot of speakers talking hebrew.. I need to count the word "Yes" that was said by one specific person ( maybe use vosk? for speaker diarization ), all other words, sounds need to be ignored.
So, beside the steps of installation, that I concider to be the following ( correct me if I'm wrong ):
Ubuntu 2020:
Install CMUSphinx https://cmusphinx.github.io/wiki/tutorialpocketsphinx/ as it described here
Install python package pocketsphinx and all dependencies(pyaudio and so on and so forth)
Train acoustic model for this hebrew word
(for one word it sound reasonable, but by the way is there an acoustic model phonetic dictionary and all the things that pocketsphinx need for hebrew?):
I need 3 files for python script: .lm.bin file with language model?
.dict file?
and hmm file?
What is the proper way to that?
Please refer me to tutorial if it exist...
Thank You for Your work there at CMU...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
For such short words like "yes" it is impossible to do keyword spotting because false alarm rate is too high. You have to impelment full LVCSR recognizer with speaker separation and search in the results.
Hi!
I have 3 hours .wav file, with a lot of speakers talking hebrew.. I need to count the word "Yes" that was said by one specific person ( maybe use vosk? for speaker diarization ), all other words, sounds need to be ignored.
So, beside the steps of installation, that I concider to be the following ( correct me if I'm wrong ):
Ubuntu 2020:
Install CMUSphinx https://cmusphinx.github.io/wiki/tutorialpocketsphinx/ as it described here
Install python package pocketsphinx and all dependencies(pyaudio and so on and so forth)
Train acoustic model for this hebrew word
(for one word it sound reasonable, but by the way is there an acoustic model phonetic dictionary and all the things that pocketsphinx need for hebrew?):
I need 3 files for python script: .lm.bin file with language model?
.dict file?
and hmm file?
What is the proper way to that?
Please refer me to tutorial if it exist...
Thank You for Your work there at CMU...
For such short words like "yes" it is impossible to do keyword spotting because false alarm rate is too high. You have to impelment full LVCSR recognizer with speaker separation and search in the results.
cmusphinx tutorial is here https://cmusphinx.github.io/wiki/tutorial
Thank's a lot for Your answer, Nickolay.