First of all please avoid posting same question to the multiple old threads. Doing that decreases your chance to get a good answer and simple not polite.
Second, sphinxtrain supports specific unicode form - UTF-8. Please make sure you are not using UTF16 or something like that. Please make sure all input files are encoded in UTF-8
In case of troubles please share your training folder. By providing your data you greatly increase the chance to get a solution, not by posting same question to the threads from 2005.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
sorry for posting same question to the multiple old threads... Hereafter i wont do it again. Am using utf-8 only but still error....
Here is my file which in utf-8.
It's better to attach files in a single archive, not as a ten links. You also attached backup files with ~ in the end, I doubt you are using them. It's probably better to share the whole folder.
It doesn't seem like your dictionary has a phonetic transcription for the words, it only contains a list of words in random order. Make sure your dictionary has one word for line with a phonetic transcription.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You still didn't fix the main problem I told you about:
It doesn't seem like your dictionary has a phonetic transcription for the words, it only contains a list of words in random order. Make sure your dictionary has one word for line with a phonetic transcription. See the acoustic model training tutorial for details.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello
I have discussed with linguistic expert and they said that, for tamil script(Which is my project) the phonetic transcription is same as the words as I have posted dictionary file. In my dictionary UTF-8 only am using but its still words matching error
Phase 6: Checking that all the words in the transcript are in the dictionary
Words in dictionary: 88
Words in filler dictionary: 3
WARNING: This word: was in the transcript file, but is not in the dictionary ( திருச்செந்தூர் அருள்மிகு சுப்பிரமணிய சுவாமி திருக்கோவிலில் புதன்கிழமையன்று ஆவணித்திருவிழா தேரோட்டம் நடைபெற்றது ). Do cases match?
I have rechecked in every aspects like
1. words in the transcript are in the dictionary
2. match case when they appear
3. words in the transcript may be misspelled
4. dictionary file is not perfectly sorted
For Roman script Phase 6 was passed. In terms unicode(Utf-8) Phase 6 is FAILED.
When I used to run sphinx All files like mixtures, means, variane, result file like align, match, feat.params, mdef, means, mixture_weights, noisedict, transition_matrices, variances files were created except Phase 6. With those files am getting accuracy nearly 70-80. Am fighting to get accuracy more than 90. Please help me out....
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In sphinx train run,
WARNING: This word:  was in the transcript file, but is not in the dictionary ( திரà¯à®šà¯à®šà¯†à®¨à¯à®¤à¯‚ர௠அரà¯à®³à¯à®®à®¿à®•௠சà¯à®ªà¯à®ªà®¿à®°à®®à®£à®¿à®¯ சà¯à®µà®¾à®®à®¿ திரà¯à®•à¯à®•ோவிலில௠பà¯à®¤à®©à¯à®•ிழமையனà¯à®±à¯ ஆவணிதà¯à®¤à®¿à®°à¯à®µà®¿à®´à®¾ தேரோடà¯à®Ÿà®®à¯ நடைபெறà¯à®±à®¤à¯ ). Do cases match?
Am using Unicode characters in dictionary. Is there any solution?
It works very well in terms of roman script. It gives me 200% accuracy.
First of all please avoid posting same question to the multiple old threads. Doing that decreases your chance to get a good answer and simple not polite.
Second, sphinxtrain supports specific unicode form - UTF-8. Please make sure you are not using UTF16 or something like that. Please make sure all input files are encoded in UTF-8
In case of troubles please share your training folder. By providing your data you greatly increase the chance to get a solution, not by posting same question to the threads from 2005.
sorry for posting same question to the multiple old threads... Hereafter i wont do it again. Am using utf-8 only but still error....
Here is my file which in utf-8.
It's better to attach files in a single archive, not as a ten links. You also attached backup files with ~ in the end, I doubt you are using them. It's probably better to share the whole folder.
It doesn't seem like your dictionary has a phonetic transcription for the words, it only contains a list of words in random order. Make sure your dictionary has one word for line with a phonetic transcription.
https://drive.google.com/folderview?id=0B5fhKPTbTJ4Pck9TTENoVmxKZnM&usp=sharing
its my whole project.. help me out
Last edit: Alexander... 2014-01-22
You still didn't fix the main problem I told you about:
Hello .... I have edited my dictioanry with phonetic transcription.. Herewith attached. Is it correct?
No, the file is not correct
1) Encoding is utf-16 instead of utf-8
2) It doesn't contain a single word with a transcription per line. There are lines which do not describe words.
Hmmm... Let me edit again
Hello
I have discussed with linguistic expert and they said that, for tamil script(Which is my project) the phonetic transcription is same as the words as I have posted dictionary file. In my dictionary UTF-8 only am using but its still words matching error
Phase 6: Checking that all the words in the transcript are in the dictionary
Words in dictionary: 88
Words in filler dictionary: 3
WARNING: This word: was in the transcript file, but is not in the dictionary ( திருச்செந்தூர் அருள்மிகு சுப்பிரமணிய சுவாமி திருக்கோவிலில் புதன்கிழமையன்று ஆவணித்திருவிழா தேரோட்டம் நடைபெற்றது ). Do cases match?
I have rechecked in every aspects like
1. words in the transcript are in the dictionary
2. match case when they appear
3. words in the transcript may be misspelled
4. dictionary file is not perfectly sorted
For Roman script Phase 6 was passed. In terms unicode(Utf-8) Phase 6 is FAILED.
When I used to run sphinx All files like mixtures, means, variane, result file like align, match, feat.params, mdef, means, mixture_weights, noisedict, transition_matrices, variances files were created except Phase 6. With those files am getting accuracy nearly 70-80. Am fighting to get accuracy more than 90. Please help me out....
Please anyone reply to solve this issue..