Hello. I am trying to adapt an acoustic model but unfortunately, the transcripts of my audio files contains words that are not in the existing CMU dictionary. I would like to extend the exitsting dictionary but for that I will have to create a text file containing all the words that I would like to include in the dictionary. My transcription file is huge and there are lot of words that is not in the default dictionary. How can I compare my transcription file with the existing dictionary? Is there a tool by cmusphinx to create a text file when the word wasn't found in the existing dictionary? Thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello. I am trying to adapt an acoustic model but unfortunately, the transcripts of my audio files contains words that are not in the existing CMU dictionary. I would like to extend the exitsting dictionary but for that I will have to create a text file containing all the words that I would like to include in the dictionary. My transcription file is huge and there are lot of words that is not in the default dictionary. How can I compare my transcription file with the existing dictionary? Is there a tool by cmusphinx to create a text file when the word wasn't found in the existing dictionary? Thanks
You can write a script in Python
I have no idea how to acheive this with python. Is there any reference script where I can take a look?
Last edit: Aanand P 2020-06-16