I used the HTK-to-Sphinx 2 feature file converter in sphinxtrain/python/sphinx/htkmfc.py to convert my HTK feature files. I would like to train models with Sphinx 3.6, but the trainer gives the following error when it tries to load the feature files:
Header size field: -1591410688(a1250000); filesize: 38544(00009690)
ERROR: "corpus.c", line 1643: MFCC read failed. Retrying after sleep...
I guess this indicates that Sphinx 3.6 cannot read Sphinx 2 feature files, am I correct? Is it somehow possible to convert Sphinx 2 feature files to Sphinx 3 format (or to directly convert HTK feature files to Sphinx 3 format)?
Best regards,
Wout
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The terminology is somewhat confusing here. Sphinx 2 feature files should really just be called "Sphinx feature files". There isn't a separate file format for Sphinx 3.
I'm not sure what is going on here, are you sure that you're not mistakenly trying to read the HTK feature file?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Before trying you converter, I had already created my own based on the description you gave me[1]. After comparing the converted files, it seems that either that SphinxTrain can only handle one byte order (which would make sense, as there is no byte order marker in the feature file).
My script isn't very robust and just uses the native byte order, which in my case is little-endian (i.e. 9633 is stored as "a1 25 00 00"). The resulting feature file is accepted by SphinxTrain. So I guess SphinxTrain assumes the first float to be little-endian. On my PC (64-bit dual core P4), your converter outputs the first float as big-endian.
We[2] also believe that your script shouldn't write self.sampPeriod, self.sampSize, and self.paramKind to the Sphinx feature file. I you do, SphinxTrain won't accept the file.
Summarizing, I think you should replace this:
self.fh.write(pack(">IIHH", self.filesize,
self.sampPeriod,
self.sampSize,
self.paramKind))
with this:
self.fh.write(pack("<I", self.filesize))
After these modifications, the files are almost identical, although I used a entirely different technique. All differences occur in the least significant bytes of the integers and are probably due to the fact that I used a textual export of the HTK feature files instead of reading the binary form.
Oh, no, htkmfc.py is not a converter, it is for reading and writing HTK format files. You need to use it in conjunction with s2mfc.py to make a converter. Here is a simple one, for instance:
!/usr/bin/env python
import htkmfc, s2mfc
import os
import sys
for fname in sys.argv[1:]:
root, ext = os.path.splitext(fname)
hfeat = htkmfc.open(fname).getall()
s2mfc.open(root + ".s2mfc", "wb").writeall(hfeat[:,0:13])
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I noticed another error in the code, which may be a problem in all the Python files. You try to determine whether byte swapping 'necessary', by comparing the result of unpack with an integer:
It is not really exactly equivalent to any Sphinx feature specification.
Sphinx calculates the dynamic features in the decoder or the trainer, rather than storing them in the feature files. Cepstral mean subtraction is also done in the decoder/trainer, so the feature files are stored un-normalized. Of course, in the Gaussian parameter files, dynamic feature computation and CMN have been applied.
So, as far as the model files are concerned, MFCC_E_D_A_Z is roughly equivalent to -feat 1s_c_d_dd -cmn current, which is a single 39-dimensional vector consisting of c0..c12,dc0-dc12,ddc0-ddc12. However I think that HTK puts the energy (c0) coefficient at the end of the feature vector rather than the beginning. Also I think the dynamic feature computation is different in HTK.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I used the HTK-to-Sphinx 2 feature file converter in sphinxtrain/python/sphinx/htkmfc.py to convert my HTK feature files. I would like to train models with Sphinx 3.6, but the trainer gives the following error when it tries to load the feature files:
Header size field: -1591410688(a1250000); filesize: 38544(00009690)
ERROR: "corpus.c", line 1643: MFCC read failed. Retrying after sleep...
I guess this indicates that Sphinx 3.6 cannot read Sphinx 2 feature files, am I correct? Is it somehow possible to convert Sphinx 2 feature files to Sphinx 3 format (or to directly convert HTK feature files to Sphinx 3 format)?
Best regards,
Wout
I'm sorry, I meant that I want to use S2 models in SphinxTrain.
Wout
The terminology is somewhat confusing here. Sphinx 2 feature files should really just be called "Sphinx feature files". There isn't a separate file format for Sphinx 3.
I'm not sure what is going on here, are you sure that you're not mistakenly trying to read the HTK feature file?
Before trying you converter, I had already created my own based on the description you gave me[1]. After comparing the converted files, it seems that either that SphinxTrain can only handle one byte order (which would make sense, as there is no byte order marker in the feature file).
My script isn't very robust and just uses the native byte order, which in my case is little-endian (i.e. 9633 is stored as "a1 25 00 00"). The resulting feature file is accepted by SphinxTrain. So I guess SphinxTrain assumes the first float to be little-endian. On my PC (64-bit dual core P4), your converter outputs the first float as big-endian.
We[2] also believe that your script shouldn't write self.sampPeriod, self.sampSize, and self.paramKind to the Sphinx feature file. I you do, SphinxTrain won't accept the file.
Summarizing, I think you should replace this:
self.fh.write(pack(">IIHH", self.filesize,
self.sampPeriod,
self.sampSize,
self.paramKind))
with this:
self.fh.write(pack("<I", self.filesize))
After these modifications, the files are almost identical, although I used a entirely different technique. All differences occur in the least significant bytes of the integers and are probably due to the fact that I used a textual export of the HTK feature files instead of reading the binary form.
Best regards,
Wout
[1] https://sourceforge.net/forum/message.php?msg_id=4362604
[2] Like you said yourself in [1] ;-).
Oh, no, htkmfc.py is not a converter, it is for reading and writing HTK format files. You need to use it in conjunction with s2mfc.py to make a converter. Here is a simple one, for instance:
!/usr/bin/env python
import htkmfc, s2mfc
import os
import sys
for fname in sys.argv[1:]:
root, ext = os.path.splitext(fname)
hfeat = htkmfc.open(fname).getall()
s2mfc.open(root + ".s2mfc", "wb").writeall(hfeat[:,0:13])
Hi,
I noticed another error in the code, which may be a problem in all the Python files. You try to determine whether byte swapping 'necessary', by comparing the result of unpack with an integer:
> self.swap = (unpack('=i', pack('<i', 42)) != 42)
The problem is, however, that because unpack() always returns a tuple, the expression will always return false. Instead, you might use this:
> self.swap = (unpack('=i', pack('<i', 42)) != (42,))
Which will give the intended result.
Best regards,
Wout
Whoops! You're right. I've fixed the other files affected by this...
I'm sorry for all the separate messages and for breaking the thread a number of times. I haven't been paying enough attention... :-S
Am I correct in assuming that the HTK features MFCC_E_D_A_Z is equivalent to 1s_12c_12d_3p_12dd in SphinxTrain (as the -feat parameter)?
Thanks,
Wout
It is not really exactly equivalent to any Sphinx feature specification.
Sphinx calculates the dynamic features in the decoder or the trainer, rather than storing them in the feature files. Cepstral mean subtraction is also done in the decoder/trainer, so the feature files are stored un-normalized. Of course, in the Gaussian parameter files, dynamic feature computation and CMN have been applied.
So, as far as the model files are concerned, MFCC_E_D_A_Z is roughly equivalent to -feat 1s_c_d_dd -cmn current, which is a single 39-dimensional vector consisting of c0..c12,dc0-dc12,ddc0-ddc12. However I think that HTK puts the energy (c0) coefficient at the end of the feature vector rather than the beginning. Also I think the dynamic feature computation is different in HTK.
Ah, I see. It may be wise to change the comment in HTKFeat_write() then.
Wout