I am currently using the an4 dataset available from here
I am trying to do phone recognition, but the utterance has only the words and not the phones. Is there someway i can extract utterance which is phonetically seperated?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Acoustic model is the same for phone and words recognition. You do not need a database separated on phonemes for phonetic recogniton, you can use conventional database. If you need higher accuracy you can use larger dataset like tedlium for training.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is two problems.. I am not sure why but I was only able to do it using the test set..
Second problem being that the number of phone classes. Most phoneme recognition describe the issue as a 61 class problem, but i seem to have more than 61 classes. above 100?..
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am currently using the an4 dataset available from here
I am trying to do phone recognition, but the utterance has only the words and not the phones. Is there someway i can extract utterance which is phonetically seperated?
Acoustic model is the same for phone and words recognition. You do not need a database separated on phonemes for phonetic recogniton, you can use conventional database. If you need higher accuracy you can use larger dataset like tedlium for training.
But it only seems possible to do it on the test set.. rather than the train set..
You need to be more clear what do you mean by "it".
I tried converting my utterances using the acoustic model to phonetic level.
What i ended up getting was only the test set seperated into phonemes.
And what is the problem? Until you explain in details nobody will help you.
There is two problems.. I am not sure why but I was only able to do it using the test set..
Second problem being that the number of phone classes. Most phoneme recognition describe the issue as a 61 class problem, but i seem to have more than 61 classes. above 100?..
In Kaldi group you provided much better description, it is sad you are trying to fool us here. No much to discuss then.