Menu

Adapting zh_broadcastnews_ptm256_8000 model

Help
damage85
2015-09-21
2015-09-23
  • damage85

    damage85 - 2015-09-21

    I'm trying to adapt some speech data to the zh_broadcastnews_ptm256_8000 and get to command as below:

    D:\sphinxtrain\sphinxtrain-5prealpha-win32\bin\Release\bw \
    -hmmdir model \
    -moddeffn model/mdef \
    -ts2cbfn .ptm. \
    -feat s2_4x \
    -cmn current \
    -agc none \
    -dictfn D:\sphinxtrain\adapting\zh_broadcastnews_utf8.dic \
    -ctlfn adapt.fileids \
    -lsnfn adapt.transcription \
    -accumdir D:\sphinxtrain\adapting\accumdir

    I get the problem:

    INFO: main.c(671): If you are not interested in profiling, use -timing no
    column defns
    <seq>
    <id>
    <n_frame_in>
    <n_frame_del>
    <n_state_shmm>
    <avg_states_alpha>
    <avg_states_beta>
    <avg_states_reest>
    <avg_posterior_prune>
    <frame_log_lik>
    <utt_log_lik>
    ... timing info ...
    utt> 0 001 673INFO: cmn.c(183): CMN: 10.70 -0.01 -0.20 0.21 -0.25 -0.30 -0.08 -0.34 0.11 -0.22 -0.15 -0.22 -0.1
    2
    0WARN: "mk_phone_list.c", line 178: Unable to lookup word '公安部' in the dictionary
    WARN: "next_utt_states.c", line 83: Unable to produce phonetic transcription for the utterance ' 公安部 交 管 局 通报 四百 家 存在 重大
    安全 隐患 运输 企业 并 限期 整改
    '
    0WARN: "main.c", line 830: Skipped utterance ' 公安部 交 管 局 通报 四百 家 存在 重大 安全 隐患 运输 企业 并 限期 整改 '
    utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.000x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x
    0.000e</utt_log_lik></frame_log_lik></avg_posterior_prune></avg_states_reest></avg_states_beta></avg_states_alpha></n_state_shmm></n_frame_del></n_frame_in></id></seq>

    my transcription file:
    公安部 交 管 局 通报 四百 家 存在 重大 安全 隐患 运输 企业 并 限期 整改 (001)

    I can find word 公安部 in zh_broadcastnews_utf8.dic,wondering why the enginee can not find the word? please help, thanks.

     
    • Nickolay V. Shmyrev

      You are welcome to provide the data files to reproduce your problems.

      Words might look the same but can be different in UTF-8 representation. Make sure that the byte values in the transcription and the dictionary are the same.

       

Log in to post a comment.