Generate a list of OOV words

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Generate a list of OOV words

Forum: Help

Creator: Aanand P

Created: 2020-06-15

Updated: 2020-06-16

Aanand P - 2020-06-15

Hello. I am trying to adapt an acoustic model but unfortunately, the transcripts of my audio files contains words that are not in the existing CMU dictionary. I would like to extend the exitsting dictionary but for that I will have to create a text file containing all the words that I would like to include in the dictionary. My transcription file is huge and there are lot of words that is not in the default dictionary. How can I compare my transcription file with the existing dictionary? Is there a tool by cmusphinx to create a text file when the word wasn't found in the existing dictionary? Thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2020-06-15
  
  You can write a script in Python
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Aanand P - 2020-06-16

I have no idea how to acheive this with python. Is there any reference script where I can take a look?

Last edit: Aanand P 2020-06-16

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.