Hello, I've been trying to use the HTK to Sphinx3 converter.
As discussed in the forum we've to make compatible features for the HTK train
and Sphinx3 decoder, so we've hacked the PocketSphinx code to print the
features used, and then wrote it in the HTK format to use in the HTK train.
To test the original model (without conversion) we've got bad results with
HVite (is not used with LVCSR) and can't use HDecode (the model to convert
can't have sp+sil). So we've used the Julius decoder.
The monophone model got good results for all the decoders, i.e., Sphinx3,
PocketSphinx and Julius. But the triphone model got weird results with Sphinx3
and PocketSphinx decoders, but works fine for Julius.
Bellow is shown the results for each of the decoders with monophone and
triphone models.
If anyone could take a look to see what is wrong or give any tips, I'll be
glad.
Sphinx3 monophone
parágrafo terceiro as emendas ao projeto de lei orçamento anual ou aos
projetos que o modifiquem somente podem ser aprovadas caso dois pontos inciso
um sejam compatíveis com o plano plurianual e com a lei de diretrizes
orçamentárias inciso dois indiquem os recursos necessários admitidos apenas os
provenientes de anulação de despesa excluídas as que incidam sobre dois pontos
alínea a dotações para pessoal e seus encargos triphone
parágrafos depende são os aos sete há um mês mês insuficientes provas seu cada
centésimos passam ou pesca sempre
PocketSphinx monophone
parágrafo terceiro as emendas ao projeto de lei orçamento anual ou aos
projetos que o modifiquem somente podem ser aprovadas caso dois pontos inciso
um sejam compatíveis com o plano plurianual e com a lei de diretrizes
orçamentárias inciso dois indiquem os recursos necessários admitidos apenas os
provenientes de anulação de despesa excluídas as que incidam sobre dois pontos
alínea a dotações para pessoal e seus encargos (art166c.mfc.sphinx -63660) triphone
para gm der seis ele la (art166c.mfc.sphinx -105526)
Julius monophone
sentence1: parágrafo terceiro <sil> as emendas ao projeto de lei orçamento
anual ou aos projetos que o modifiquem somente podem ser aprovadas caso <sil>
dois pontos inciso um sejam compatíveis com plano plurianual e com a lei de
diretrizes orçamentárias inciso dois indiquem os recursos necessários
admitidos apenas os provenientes de anulação de despesa <sil> excluídas as que
incidam sobre <sil> dois pontos <sil> alínea a dotações para pessoal e seus
encargos </sil></sil></sil></sil></sil> triphone parágrafo terceiro as emendas ao projeto de lei do orçamento anual ou aos
projetos que o modifiquem somente podem ser aprovadas caso <sil> dois pontos
inciso um o um sejam compatíveis com o plano plurianual e com a lei de
diretrizes orçamentárias inciso dois <sil> indiquem os recursos necessários
admitidos apenas os provenientes de anulação de despesa excluídas as que
incidam sobre dois pontos alínea a dotações para pessoal e seus encargos </sil></sil>
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is some bug in htk2s3conv. If you check monophone model gaussians for
phone SIL (number 114 in means) you'll see they are the same as in hmmdefs.
If you check triphone means, same state 114 you'll see they are significantly
different from the state SIL_ST21 which must correspond too SIL state 114
This problem should be easy to research, it just requires some analyzis of the
htk2s3 code.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2012-01-13
Thanks nshmyrev.
I can't make all the tests I need to ensure the equivalence between models (my
laboratory cluster is down). But with the test file I've reported have
equivalent results, and comparing the models it seems ok. So I'm pretty sure
the conversion is ok.
I think the diff bellow resolve the problem, do you need any explication for
version control commit or the diff is enough?
Thanks again.
diff --git a/htk_converter.py b/htk_converter.py
index c997709..dd748d8 100644
--- a/htk_converter.py
+++ b/htk_converter.py
@@ -10,6 +10,7 @@ from struct import unpack, pack
import sys
from sys import exit
import time
+import operator
from ply import *
@@ -347,7 +348,8 @@ class HtkConverter(object):
dimensionality.
n = 0
o = 0
- for state in states:
+ for (state, sId) in sorted(self.statesToIds.iteritems(),
+ key=operator.itemgetter(1)):
for (iMixture, mixtureWeight, mixture) in state.mixtures:
for float in mixture.mean.vector:
mfile.write(pack('=f', float))
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your investigation. The fix above is not enough because states need
to be sorted in other places (where we dump mixtures themselves too). I've
committed a little bit different thing already, it should work now
However, it works only for sphinx3 and pocketsphinx fwdtree. Fwdflat in
pocketsphinx is somewhat broken. I'm looking on this issue now.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ok, it's more or less configuration issue, your model seems too discriminative
and requires higher beams both for fwdtree and fwdflat. For example fwdflat
beam should be somewhat like 1e-200 instead of 1e-65. Otherwise decoding works
fine.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello, I've been trying to use the HTK to Sphinx3 converter.
As discussed in the forum we've to make compatible features for the HTK train
and Sphinx3 decoder, so we've hacked the PocketSphinx code to print the
features used, and then wrote it in the HTK format to use in the HTK train.
To test the original model (without conversion) we've got bad results with
HVite (is not used with LVCSR) and can't use HDecode (the model to convert
can't have sp+sil). So we've used the Julius decoder.
The monophone model got good results for all the decoders, i.e., Sphinx3,
PocketSphinx and Julius. But the triphone model got weird results with Sphinx3
and PocketSphinx decoders, but works fine for Julius.
Bellow is shown the results for each of the decoders with monophone and
triphone models.
The resources used to decoder and do the conversion is available at: http://w
ww.laps.ufpa.br/pedrobatista/htk2sphinx_conversion.zip in this zip a file named run.txt
show the command line used in each of the decoders.
If anyone could take a look to see what is wrong or give any tips, I'll be
glad.
Sphinx3
monophone
parágrafo terceiro as emendas ao projeto de lei orçamento anual ou aos
projetos que o modifiquem somente podem ser aprovadas caso dois pontos inciso
um sejam compatíveis com o plano plurianual e com a lei de diretrizes
orçamentárias inciso dois indiquem os recursos necessários admitidos apenas os
provenientes de anulação de despesa excluídas as que incidam sobre dois pontos
alínea a dotações para pessoal e seus encargos
triphone
parágrafos depende são os aos sete há um mês mês insuficientes provas seu cada
centésimos passam ou pesca sempre
PocketSphinx
monophone
parágrafo terceiro as emendas ao projeto de lei orçamento anual ou aos
projetos que o modifiquem somente podem ser aprovadas caso dois pontos inciso
um sejam compatíveis com o plano plurianual e com a lei de diretrizes
orçamentárias inciso dois indiquem os recursos necessários admitidos apenas os
provenientes de anulação de despesa excluídas as que incidam sobre dois pontos
alínea a dotações para pessoal e seus encargos (art166c.mfc.sphinx -63660)
triphone
para gm der seis ele la (art166c.mfc.sphinx -105526)
Julius
monophone
sentence1:
parágrafo terceiro <sil> as emendas ao projeto de lei orçamentoanual ou aos projetos que o modifiquem somente podem ser aprovadas caso <sil>
dois pontos inciso um sejam compatíveis com plano plurianual e com a lei de
diretrizes orçamentárias inciso dois indiquem os recursos necessários
admitidos apenas os provenientes de anulação de despesa <sil> excluídas as que
incidam sobre <sil> dois pontos <sil> alínea a dotações para pessoal e seus
encargos </sil></sil></sil></sil></sil>
triphone
parágrafo terceiro as emendas ao projeto de lei do orçamento anual ou aosprojetos que o modifiquem somente podem ser aprovadas caso <sil> dois pontos
inciso um o um sejam compatíveis com o plano plurianual e com a lei de
diretrizes orçamentárias inciso dois <sil> indiquem os recursos necessários
admitidos apenas os provenientes de anulação de despesa excluídas as que
incidam sobre dois pontos alínea a dotações para pessoal e seus encargos </sil></sil>
There is some bug in htk2s3conv. If you check monophone model gaussians for
phone SIL (number 114 in means) you'll see they are the same as in hmmdefs.
If you check triphone means, same state 114 you'll see they are significantly
different from the state SIL_ST21 which must correspond too SIL state 114
This problem should be easy to research, it just requires some analyzis of the
htk2s3 code.
Thanks nshmyrev.
I can't make all the tests I need to ensure the equivalence between models (my
laboratory cluster is down). But with the test file I've reported have
equivalent results, and comparing the models it seems ok. So I'm pretty sure
the conversion is ok.
I think the diff bellow resolve the problem, do you need any explication for
version control commit or the diff is enough?
Thanks again.
diff --git a/htk_converter.py b/htk_converter.py
index c997709..dd748d8 100644
--- a/htk_converter.py
+++ b/htk_converter.py
@@ -10,6 +10,7 @@ from struct import unpack, pack
import sys
from sys import exit
import time
+import operator
from ply import *
@@ -347,7 +348,8 @@ class HtkConverter(object):
dimensionality.
n = 0
o = 0
- for state in states:
+ for (state, sId) in sorted(self.statesToIds.iteritems(),
+ key=operator.itemgetter(1)):
for (iMixture, mixtureWeight, mixture) in state.mixtures:
for float in mixture.mean.vector:
mfile.write(pack('=f', float))
Thanks for your investigation. The fix above is not enough because states need
to be sorted in other places (where we dump mixtures themselves too). I've
committed a little bit different thing already, it should work now
However, it works only for sphinx3 and pocketsphinx fwdtree. Fwdflat in
pocketsphinx is somewhat broken. I'm looking on this issue now.
Ok, it's more or less configuration issue, your model seems too discriminative
and requires higher beams both for fwdtree and fwdflat. For example fwdflat
beam should be somewhat like 1e-200 instead of 1e-65. Otherwise decoding works
fine.