I have a general question about training data for Sphinx 3.
I will collect training data in Turkish for my ASR system soon.
I want to collect training data by transcribing some audio files in Turkish.
Do I have to use an audio tool like Transcriber or Exmaralda when transcribing
audio files,
will start and end frames of each word be important at any stage of training ?
I have done training in English with VoxForge data before and it doesn't
contain frames of the words.
So, I want to be sure that only audio files and their pure transcriptions are
my need.
Thanks a lot.
Berker
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I mean, indicating the start and end time of each word in transcription file.
My actual question is, is there any alternative training procedure in training
with Sphinx, which uses those start and end time of
each word?
If there is any, I will collect my training data accordingly.
If not, I will collect only transcription of audio files, as in English.
Thanks,
Berker
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
My actual question is, is there any alternative training procedure in
training with Sphinx, which uses those start and end time of each word?
There is an alternative. One can create state alignment and use it as an
argument to init_gau. One has to write a program to dump boundaries and modify
training scripts. But there is no use in word boundaries. Phone boundaries
could be helpful, words aren't.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I have a general question about training data for Sphinx 3.
I will collect training data in Turkish for my ASR system soon.
I want to collect training data by transcribing some audio files in Turkish.
Do I have to use an audio tool like Transcriber or Exmaralda when transcribing
audio files,
will start and end frames of each word be important at any stage of training ?
I have done training in English with VoxForge data before and it doesn't
contain frames of the words.
So, I want to be sure that only audio files and their pure transcriptions are
my need.
Thanks a lot.
Berker
what do you mean with frame of the words ?
Train the system as you did in English.
I mean, indicating the start and end time of each word in transcription file.
My actual question is, is there any alternative training procedure in training
with Sphinx, which uses those start and end time of
each word?
If there is any, I will collect my training data accordingly.
If not, I will collect only transcription of audio files, as in English.
Thanks,
Berker
There is an alternative. One can create state alignment and use it as an
argument to init_gau. One has to write a program to dump boundaries and modify
training scripts. But there is no use in word boundaries. Phone boundaries
could be helpful, words aren't.