Menu

tdt_sc_8k

Help
creative64
2011-12-24
2012-09-22
  • creative64

    creative64 - 2011-12-24

    Hi,

    Pocketsphinx provides a chinese acoustic model "tdt_sc_8k" in the
    distribution. Want to get some basic info regarding the
    model.
    a) Is this a mandarin model or a cantonese one ?
    b) There are 2 chinese dictionaries that come with distribution. Which one is
    the dictionary compatible to this model ?
    c) What is the encoding used in the dictionary (When I try opening it with MS
    Words I see options like Unicode,
    Unicode (Big endian), Unicode (UTF-7) and Unicode (UTF-8).
    d) What kind of speech the model is trained on and its accuracy in general
    (viz a viz hub4wsj_sc_8k english acoustic
    model).
    e) Where can I get the phoneme list and mdef.txt file for this model.

    Thanks and regards,

     
  • Nickolay V. Shmyrev

    a) Is this a mandarin model or a cantonese one ?

    Mandarin

    b) There are 2 chinese dictionaries that come with distribution. Which one
    is the dictionary compatible to this model ?

    Both are compatible, they are different by words orphography, not by the
    sounds

    c) What is the encoding used in the dictionary (When I try opening it with
    MS Words I see options like Unicode,
    Unicode (Big endian), Unicode (UTF-7) and Unicode (UTF-8).

    UTF-8

    d) What kind of speech the model is trained on and its accuracy in general
    (viz a viz hub4wsj_sc_8k english acoustic
    model).

    It's trained on broadcast news. Accuracy is unknown and is not expected to be
    high. It's more a model for small vocabularies.

    e) Where can I get the phoneme list and mdef.txt file for this model.

    You can convert binary mdef to text mdef with pocketsphinx_mdef_convert.

     
  • creative64

    creative64 - 2011-12-26

    Thanks Nicole.

    And now few more basic questions

    a) For english phonemes the phonemeset is specified by
    Phoneme Example Translation


    AA odd AA D
    AE at AE T
    AH hut HH AH T
    ...etc

    Is there a similar description available for phonemes used in chinese acoustic
    model or it needs to be inferred by
    looking at dictiionary ?

    b) Is there an automated way to generate pronuciations for english words using
    the phoneset present in this
    chinese acoustic model (letter to sound conversion)?

    Thanks and regards,

     
  • Nickolay V. Shmyrev

    used in chinese acoustic model or it needs to be inferred by looking at
    dictiionary ?

    Is it a problem worth to ask?

    Is there an automated way to generate pronuciations for english words using
    the phoneset present in this chinese acoustic model (letter to sound
    conversion)?

    In Mandarin there is no regular alphabet so the pronunciation needs to be
    predicted from a database. Unihan database could be used for that purpose. The
    corresponding code (unihan_to_sphinx.pl) is a part of cmuclmtk

     
  • creative64

    creative64 - 2011-12-27

    Thanks Nickole.

    It would be great if in Windows build, pocketsphinx_mdef_convert is included
    in projects so that
    it compiles with other projects.

    Thanks and regards,

     
  • Nickolay V. Shmyrev

    It would be great if in Windows build, pocketsphinx_mdef_convert is included
    in projects so that

    Done in trunk

     
  • creative64

    creative64 - 2012-01-05

    Thanks a lot Nickolay.

    Regards,

    PS: With the help of Google translate I'm able to create a small JSGF based
    mandarin application
    using tdt_sc_8k and it is working ok.

     
  • Anonymous

    Anonymous - 2012-07-04

    I was just looking for pocketsphinx_mdef_convert for windows as well, could
    you point to where I can download it? I downloaded the latest snapshot from h
    ttp://cmusphinx.sourceforge.net/wiki/download/,
    but it does not contain the executable.

    Thank you.

     
  • Nickolay V. Shmyrev

    I was just looking for pocketsphinx_mdef_convert for windows as well, could
    you point to where I can download it? I downloaded the latest snapshot from h
    ttp://cmusphinx.sourceforge.net/wiki/download/,
    but it does not contain the executable.

    You need to build the tool yourself. You can use MS Visual Studio 2010 Express
    for that.

     

Log in to post a comment.