I'm new to Sphinx as I have just begun working with it.
I am a little hazy on all the tools available withing Sphinx and SphinxTrain, so I have a question that hopefully someone can point me in the right direction toward the answer.
I have a large set of sentence waveform files (sentences spoken in a particular accent) that I wish to use to train Sphinx (using SphinxTrain).
In creating the dictionary containing phonemic decomposition of the words I am having some trouble.
The dataset I have has already been annotated on a phonemic level in the following way:
For the sentence "The price range is smaller than any of us expected", the phoneme annotation is in this format:
1.0900 -1 H#
1.1350 -1 D
1.1650 -1 @
1.2500 -1 p
1.3050 -1 r
1.4250 -1 ai
1.5450 -1 s
1.5800 -1 r
....
3.3200 -1 k
3.4350 -1 t
3.5300 -1 @
3.6100 -1 d
4.4900 -1 #
Note there are no word boundaries defined. Thus I cannot simply convert it to a dictionary of the format required for training. ie.
THE D @
PRICE P R AI S
etc
Now my problem is thus; since this data has been maticulously annotated, is there some way of using this format within Sphinx/SphinxTrain? Ie. phonemes are matched with a time-stamp from the audio files, is there a way to have SphinxTrain to use this information in some way to help training?
My second question is, if I were to ignore the phoneme annotations, and simply use the Sphinx Knowledge Base Tool (http://www.speech.cs.cmu.edu/tools/lmtool.html) to create my dictionary based on a list of all the sentences I have (there are hundreds of individual sentences, and thousands of recordings in various accents), would I lose a degree of accuracy in training/recognition by using this tool to generate and "automatic" list?
How would using a premade dictionary based on say, american pronounciations of words, affect training using waveform of a different accent (say Brittish accent)? Would it make a difference at all?
As you can see I am not entirely sure on how the dictionary and the accompanying phonemic decomposition of the word, effect the training process!
Sorry for the length, and thanks in advance for any help!
Cheers
Maaroof
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi guys,
I'm new to Sphinx as I have just begun working with it.
I am a little hazy on all the tools available withing Sphinx and SphinxTrain, so I have a question that hopefully someone can point me in the right direction toward the answer.
I have a large set of sentence waveform files (sentences spoken in a particular accent) that I wish to use to train Sphinx (using SphinxTrain).
In creating the dictionary containing phonemic decomposition of the words I am having some trouble.
The dataset I have has already been annotated on a phonemic level in the following way:
For the sentence "The price range is smaller than any of us expected", the phoneme annotation is in this format:
....
3.3200 -1 k
3.4350 -1 t
3.5300 -1 @
3.6100 -1 d
4.4900 -1 #
Note there are no word boundaries defined. Thus I cannot simply convert it to a dictionary of the format required for training. ie.
THE D @
PRICE P R AI S
etc
Now my problem is thus; since this data has been maticulously annotated, is there some way of using this format within Sphinx/SphinxTrain? Ie. phonemes are matched with a time-stamp from the audio files, is there a way to have SphinxTrain to use this information in some way to help training?
My second question is, if I were to ignore the phoneme annotations, and simply use the Sphinx Knowledge Base Tool (http://www.speech.cs.cmu.edu/tools/lmtool.html) to create my dictionary based on a list of all the sentences I have (there are hundreds of individual sentences, and thousands of recordings in various accents), would I lose a degree of accuracy in training/recognition by using this tool to generate and "automatic" list?
How would using a premade dictionary based on say, american pronounciations of words, affect training using waveform of a different accent (say Brittish accent)? Would it make a difference at all?
As you can see I am not entirely sure on how the dictionary and the accompanying phonemic decomposition of the word, effect the training process!
Sorry for the length, and thanks in advance for any help!
Cheers
Maaroof