Menu

Unicode: Do case match?

Help
2014-01-09
2017-07-18
  • Alexander...

    Alexander... - 2014-01-09

    In sphinx train run,

    WARNING: This word:  was in the transcript file, but is not in the dictionary ( திருச்செந்தூர் அருள்மிகு சுப்பிரமணிய சுவாமி திருக்கோவிலில் புதன்கிழமையன்று ஆவணித்திருவிழா தேரோட்டம் நடைபெற்றது ). Do cases match?

    Am using Unicode characters in dictionary. Is there any solution?
    It works very well in terms of roman script. It gives me 200% accuracy.

     
  • Nickolay V. Shmyrev

    First of all please avoid posting same question to the multiple old threads. Doing that decreases your chance to get a good answer and simple not polite.

    Second, sphinxtrain supports specific unicode form - UTF-8. Please make sure you are not using UTF16 or something like that. Please make sure all input files are encoded in UTF-8

    In case of troubles please share your training folder. By providing your data you greatly increase the chance to get a solution, not by posting same question to the threads from 2005.

     
  • Nickolay V. Shmyrev

    It's better to attach files in a single archive, not as a ten links. You also attached backup files with ~ in the end, I doubt you are using them. It's probably better to share the whole folder.

    It doesn't seem like your dictionary has a phonetic transcription for the words, it only contains a list of words in random order. Make sure your dictionary has one word for line with a phonetic transcription.

     
    • Alexander...

      Alexander... - 2014-01-20
       

      Last edit: Alexander... 2014-01-22
      • Nickolay V. Shmyrev

        You still didn't fix the main problem I told you about:

        It doesn't seem like your dictionary has a phonetic transcription for the words, it only contains a list of words in random order. Make sure your dictionary has one word for line with a phonetic transcription. See the acoustic model training tutorial for details.

         
        • Alexander...

          Alexander... - 2014-02-03

          Hello .... I have edited my dictioanry with phonetic transcription.. Herewith attached. Is it correct?

           
  • Nickolay V. Shmyrev

    Is it correct?

    No, the file is not correct

    1) Encoding is utf-16 instead of utf-8

    2) It doesn't contain a single word with a transcription per line. There are lines which do not describe words.

     
    • Alexander...

      Alexander... - 2014-02-03

      Hmmm... Let me edit again

       
  • Alexander...

    Alexander... - 2014-02-07

    Hello
    I have discussed with linguistic expert and they said that, for tamil script(Which is my project) the phonetic transcription is same as the words as I have posted dictionary file. In my dictionary UTF-8 only am using but its still words matching error
    Phase 6: Checking that all the words in the transcript are in the dictionary
    Words in dictionary: 88
    Words in filler dictionary: 3
    WARNING: This word:  was in the transcript file, but is not in the dictionary ( திருச்செந்தூர் அருள்மிகு சுப்பிரமணிய சுவாமி திருக்கோவிலில் புதன்கிழமையன்று ஆவணித்திருவிழா தேரோட்டம் நடைபெற்றது ). Do cases match?
    I have rechecked in every aspects like
    1. words in the transcript are in the dictionary
    2. match case when they appear
    3. words in the transcript may be misspelled
    4. dictionary file is not perfectly sorted

    For Roman script Phase 6 was passed. In terms unicode(Utf-8) Phase 6 is FAILED.
    When I used to run sphinx All files like mixtures, means, variane, result file like align, match, feat.params, mdef, means, mixture_weights, noisedict, transition_matrices, variances files were created except Phase 6. With those files am getting accuracy nearly 70-80. Am fighting to get accuracy more than 90. Please help me out....

     
  • Alexander...

    Alexander... - 2014-02-10

    Please anyone reply to solve this issue..

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.