CMU Sphinx / Forums / Help: Word dictionaries and Phoneme decomposition

Hi guys,

I'm new to Sphinx as I have just begun working with it.
I am a little hazy on all the tools available withing Sphinx and SphinxTrain, so I have a question that hopefully someone can point me in the right direction toward the answer.

I have a large set of sentence waveform files (sentences spoken in a particular accent) that I wish to use to train Sphinx (using SphinxTrain).

In creating the dictionary containing phonemic decomposition of the words I am having some trouble.

The dataset I have has already been annotated on a phonemic level in the following way:

For the sentence "The price range is smaller than any of us expected", the phoneme annotation is in this format:

1.0900 -1 H#
1.1350 -1 D
1.1650 -1 @
1.2500 -1 p
1.3050 -1 r
1.4250 -1 ai
1.5450 -1 s
1.5800 -1 r

....
3.3200 -1 k
3.4350 -1 t
3.5300 -1 @
3.6100 -1 d
4.4900 -1 #

Note there are no word boundaries defined. Thus I cannot simply convert it to a dictionary of the format required for training. ie.
THE D @
PRICE P R AI S
etc

Now my problem is thus; since this data has been maticulously annotated, is there some way of using this format within Sphinx/SphinxTrain? Ie. phonemes are matched with a time-stamp from the audio files, is there a way to have SphinxTrain to use this information in some way to help training?

My second question is, if I were to ignore the phoneme annotations, and simply use the Sphinx Knowledge Base Tool (http://www.speech.cs.cmu.edu/tools/lmtool.html) to create my dictionary based on a list of all the sentences I have (there are hundreds of individual sentences, and thousands of recordings in various accents), would I lose a degree of accuracy in training/recognition by using this tool to generate and "automatic" list?

How would using a premade dictionary based on say, american pronounciations of words, affect training using waveform of a different accent (say Brittish accent)? Would it make a difference at all?

As you can see I am not entirely sure on how the dictionary and the accompanying phonemic decomposition of the word, effect the training process!

Sorry for the length, and thanks in advance for any help!

Cheers

Maaroof

Word dictionaries and Phoneme decomposition

Speech Recognition Toolkit

Forums

Help

Word dictionaries and Phoneme decomposition document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Word dictionaries and Phoneme decomposition