Hello,
I'm trying to build a speech recognition system only for Hebrew letters. The system should recognize single letter at a time (1 out of 22 letters). Can you estimate the number of recordings for each letters/ recording time for each letter to train and get good results.
Is there any hebrew acoustic model that i can use? thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Current speech speech systems can use speech-to-letter models for decoding the audio signal, together with a language model to detect legal sequences. Look into into speech-to-vec 2 models; thesecan generate a feature space that can then be trained to classify tokens for specific languages. With either approach you will need to have at least some annotated speech to map the audio to symbols. One rule of thumb is ~50 instances for each symbol, but this can vary accordining to the end task. Trying for a uniform distribution over symbols is a good idea.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I'm trying to build a speech recognition system only for Hebrew letters. The system should recognize single letter at a time (1 out of 22 letters). Can you estimate the number of recordings for each letters/ recording time for each letter to train and get good results.
Is there any hebrew acoustic model that i can use? thanks
Current speech speech systems can use speech-to-letter models for decoding the audio signal, together with a language model to detect legal sequences. Look into into speech-to-vec 2 models; thesecan generate a feature space that can then be trained to classify tokens for specific languages. With either approach you will need to have at least some annotated speech to map the audio to symbols. One rule of thumb is ~50 instances for each symbol, but this can vary accordining to the end task. Trying for a uniform distribution over symbols is a good idea.