Menu

TIMIT phone recognition, Sphinx-3 results

Help
2012-01-01
2012-09-22
  • Pranav Jawale

    Pranav Jawale - 2012-01-01

    Hi,

    I was trying to duplicate phone recognition results of
    http://ttic.uchicago.edu/~jkeshet/papers/KeshetGrBe08.pdf on TIMIT test
    set (training done on TIMIT train set). Pls see pp 15-16. They have
    created 39 CI HMM models using a software package "Torch" which is
    compatible with HTK. They use time alignment info too. They get 64%
    accuracy. They didn't mention any phone language model.

    When I used null grammar in sphinx-3, I get 39% phone accuracy, and 58%
    accuracy when trigram-LM is used. I used 5 state HMM (skipstate = yes) and
    32 gaussians as used in the above paper (I used SphinxTain for training) .
    Also I varied WIP in some broad
    range. Beamwidth was kept default.

    If anybody has Timit CI phone recognition results using Sphinx3/4, please
    share.

    Thanks.

     
  • Nickolay V. Shmyrev

    CMUSphinx shouldn't be different from other toolkits. You results are more or
    less fine, maybe need some small tweaks but in range. Many researchers report
    64% CI accuracy with bigram phone model. For example

    H. Glass et al. “A probabilistic framework for feature-based speech
    recognition”.

    For the reference script to train a model with HTK you can check HTKTimit from
    Tony Robinson:

    http://www.cantabResearch.com/HTKtimit.sh

    There are some surveys on subject which basically site same results:

    http://laps.ufpa.br/aldebaro/papers/Timitresults.pdf

    http://www.intechopen.com/download/pdf/pdfs_id/15948

    As for the paper you cited, I think it also uses bigram language model just
    doesn't mention that. Or it's not quite correct.

    You might now already but you can download all Josef Keshet sources from his
    home page:

    http://ttic.uchicago.edu/~jkeshet/Source_Code.html

     
  • Pranav Jawale

    Pranav Jawale - 2012-01-02

    Thanks a lot for various pointers.

    I will try to further tweak my config.

     
  • Nickolay V. Shmyrev

    There is not much to tweak. The most critical thing to do is an initialization
    from segmentation, not flat start as in default sphinxtrain.

     
  • Matt Robinson

    Matt Robinson - 2012-05-11

    In setting up a phoneme recogniser, can phoneme segmentation files be used in
    CMUSphinx to bootstrap the models? FlatInit in the SphinxTrain scripts is
    hardcoded in Module 20, is there a way to use the slave pearl scripts maybe by
    setting iter to 2? Otherwise I am considering using HTK and converting to
    sphinx format, seems to work with most of the CMU decoders.

     

Log in to post a comment.