I am trying to find ways to reduce the size
of the model files ("sendump", "map", and "phone") generated by the SphinxTrain,
to create an adaptation of the Sphinx2 on
embedded systems. I only have a small vocaburaly set (200 words).
Did anybody try to reduce these files ?
My ideas:
-use 3 states instead of 5, but this is
impossible with sphinx2 :(
-group the phonemes by similarity (for instance,
k and t and p), to have only 20 phonemes instead of 40 before the training. But I do not have enough training data to make measurements of the impact
-throw away some data from those files. It might
seem to be a joke, but when using a small vocabulary set, not all of the senones are usefull...
Any experience to share? Or ideas?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1. use 3 state definitly can reduce size, but you can use decoder III.
2. If you use a small vacabulory table. I think Train process already discard unnessary phoneme/senones.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
One thing to do is to use, say, 4000-state models instead of 6000 states. That would get you to 2/3 the size. Sphinx2 does have the limitation that it's hard-wired to 3 states per phone HMM, and sphinx3 does not have that topology constraint, but sphinx2 is still the winner for speed. The default models have 6K states. The 4K version will fit on e.g. an iPaq, though some folks prefer to just do feature extraction on the device (get the cepstral coefficients) and pass off the much smaller feature stream to other networked machines. That's an aside, though :)
I think your best bet is to reduce the number of states. I'll see if we can produce a 4k state model; we had one in the past before conversion to the new front-end/phoneset/training, and this would be useful for folks trying to get down to small footprint.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi everyone,
I am trying to find ways to reduce the size
of the model files ("sendump", "map", and "phone") generated by the SphinxTrain,
to create an adaptation of the Sphinx2 on
embedded systems. I only have a small vocaburaly set (200 words).
Did anybody try to reduce these files ?
My ideas:
-use 3 states instead of 5, but this is
impossible with sphinx2 :(
-group the phonemes by similarity (for instance,
k and t and p), to have only 20 phonemes instead of 40 before the training. But I do not have enough training data to make measurements of the impact
-throw away some data from those files. It might
seem to be a joke, but when using a small vocabulary set, not all of the senones are usefull...
Any experience to share? Or ideas?
1. use 3 state definitly can reduce size, but you can use decoder III.
2. If you use a small vacabulory table. I think Train process already discard unnessary phoneme/senones.
Thank you Lei,
Kevin, any ideas?
I was wondering if there is a way to re-cluster
the sendump file, by grouping vectors of the
same kind.
One thing to do is to use, say, 4000-state models instead of 6000 states. That would get you to 2/3 the size. Sphinx2 does have the limitation that it's hard-wired to 3 states per phone HMM, and sphinx3 does not have that topology constraint, but sphinx2 is still the winner for speed. The default models have 6K states. The 4K version will fit on e.g. an iPaq, though some folks prefer to just do feature extraction on the device (get the cepstral coefficients) and pass off the much smaller feature stream to other networked machines. That's an aside, though :)
I think your best bet is to reduce the number of states. I'll see if we can produce a 4k state model; we had one in the past before conversion to the new front-end/phoneset/training, and this would be useful for folks trying to get down to small footprint.