Menu

HTK to Sphinx3 conversion with triphones

Help
Anonymous
2012-01-12
2012-09-22
  • Anonymous

    Anonymous - 2012-01-12

    Hello, I've been trying to use the HTK to Sphinx3 converter.

    As discussed in the forum we've to make compatible features for the HTK train
    and Sphinx3 decoder, so we've hacked the PocketSphinx code to print the
    features used, and then wrote it in the HTK format to use in the HTK train.

    To test the original model (without conversion) we've got bad results with
    HVite (is not used with LVCSR) and can't use HDecode (the model to convert
    can't have sp+sil). So we've used the Julius decoder.

    The monophone model got good results for all the decoders, i.e., Sphinx3,
    PocketSphinx and Julius. But the triphone model got weird results with Sphinx3
    and PocketSphinx decoders, but works fine for Julius.

    Bellow is shown the results for each of the decoders with monophone and
    triphone models.

    The resources used to decoder and do the conversion is available at: http://w
    ww.laps.ufpa.br/pedrobatista/htk2sphinx_conversion.zip
    in this zip a file named run.txt
    show the command line used in each of the decoders.

    If anyone could take a look to see what is wrong or give any tips, I'll be
    glad.

    Sphinx3
    monophone
    parágrafo terceiro as emendas ao projeto de lei orçamento anual ou aos
    projetos que o modifiquem somente podem ser aprovadas caso dois pontos inciso
    um sejam compatíveis com o plano plurianual e com a lei de diretrizes
    orçamentárias inciso dois indiquem os recursos necessários admitidos apenas os
    provenientes de anulação de despesa excluídas as que incidam sobre dois pontos
    alínea a dotações para pessoal e seus encargos
    triphone
    parágrafos depende são os aos sete há um mês mês insuficientes provas seu cada
    centésimos passam ou pesca sempre

    PocketSphinx
    monophone
    parágrafo terceiro as emendas ao projeto de lei orçamento anual ou aos
    projetos que o modifiquem somente podem ser aprovadas caso dois pontos inciso
    um sejam compatíveis com o plano plurianual e com a lei de diretrizes
    orçamentárias inciso dois indiquem os recursos necessários admitidos apenas os
    provenientes de anulação de despesa excluídas as que incidam sobre dois pontos
    alínea a dotações para pessoal e seus encargos (art166c.mfc.sphinx -63660)
    triphone
    para gm der seis ele la (art166c.mfc.sphinx -105526)

    Julius
    monophone
    sentence1: parágrafo terceiro <sil> as emendas ao projeto de lei orçamento
    anual ou aos projetos que o modifiquem somente podem ser aprovadas caso <sil>
    dois pontos inciso um sejam compatíveis com plano plurianual e com a lei de
    diretrizes orçamentárias inciso dois indiquem os recursos necessários
    admitidos apenas os provenientes de anulação de despesa <sil> excluídas as que
    incidam sobre <sil> dois pontos <sil> alínea a dotações para pessoal e seus
    encargos </sil></sil></sil></sil></sil>

    triphone
    parágrafo terceiro as emendas ao projeto de lei do orçamento anual ou aos
    projetos que o modifiquem somente podem ser aprovadas caso <sil> dois pontos
    inciso um o um sejam compatíveis com o plano plurianual e com a lei de
    diretrizes orçamentárias inciso dois <sil> indiquem os recursos necessários
    admitidos apenas os provenientes de anulação de despesa excluídas as que
    incidam sobre dois pontos alínea a dotações para pessoal e seus encargos </sil></sil>

     
  • Nickolay V. Shmyrev

    There is some bug in htk2s3conv. If you check monophone model gaussians for
    phone SIL (number 114 in means) you'll see they are the same as in hmmdefs.

    If you check triphone means, same state 114 you'll see they are significantly
    different from the state SIL_ST21 which must correspond too SIL state 114

    This problem should be easy to research, it just requires some analyzis of the
    htk2s3 code.

     
  • Anonymous

    Anonymous - 2012-01-13

    Thanks nshmyrev.

    I can't make all the tests I need to ensure the equivalence between models (my
    laboratory cluster is down). But with the test file I've reported have
    equivalent results, and comparing the models it seems ok. So I'm pretty sure
    the conversion is ok.

    I think the diff bellow resolve the problem, do you need any explication for
    version control commit or the diff is enough?

    Thanks again.

    diff --git a/htk_converter.py b/htk_converter.py
    index c997709..dd748d8 100644
    --- a/htk_converter.py
    +++ b/htk_converter.py
    @@ -10,6 +10,7 @@ from struct import unpack, pack
    import sys
    from sys import exit
    import time
    +import operator

    from ply import *

    @@ -347,7 +348,8 @@ class HtkConverter(object):

    dimensionality.

    n = 0
    o = 0
    - for state in states:
    + for (state, sId) in sorted(self.statesToIds.iteritems(),
    + key=operator.itemgetter(1)):
    for (iMixture, mixtureWeight, mixture) in state.mixtures:
    for float in mixture.mean.vector:
    mfile.write(pack('=f', float))

     
  • Nickolay V. Shmyrev

    Thanks for your investigation. The fix above is not enough because states need
    to be sorted in other places (where we dump mixtures themselves too). I've
    committed a little bit different thing already, it should work now

    However, it works only for sphinx3 and pocketsphinx fwdtree. Fwdflat in
    pocketsphinx is somewhat broken. I'm looking on this issue now.

     
  • Nickolay V. Shmyrev

    Ok, it's more or less configuration issue, your model seems too discriminative
    and requires higher beams both for fwdtree and fwdflat. For example fwdflat
    beam should be somewhat like 1e-200 instead of 1e-65. Otherwise decoding works
    fine.

     

Log in to post a comment.