Hi,
Does the Sphinx4 recognizer come with acoustic models? Or do we have to develop them ? I had worked with the HTK, and know that building acoustic models is a painful procedure if you have to start from collecting voice samples. Does anybody know if there are any freely available corpora or acoustic models on the net? I will be grasteful to any help.
-Jaideep
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks Will,
Thanks for the links. I have downloaded those files. I will to use those models. I have one more doubt though; my application needs tuning acoustic models for Indian English. Is it possible to use these models and then customize it for the end user? or will I have to develop new models specially for Indian English altogether? Please help me out in this matter. Thanks for the links once again.
-Jaideep
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This past summer, Arthur Chan did some work on SphinxTrain with respect to regression matrices for acoustic models. The idea is that one can use training data from perhaps one person to create data that can be used to skew/adapt an existing acoustic model to the new training data.
While it has latent support to handle these matrices, Sphinx-4 currently doesn't have the ability to read this data in. Adding this support should be pretty straightforward, and I'm hoping someone might step up and help us with this.
I'm not sure, however, if creating adaptation models for Indian English and applying them to acoustic models for native English speakers would work well. For example, "Indian English" tends to switch the pronunciation of v's and w's, which seems to be quite a significant change. Our speech experts would be better able to advise in this space, though.
Happy New Year,
Will
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
HI Jaideep,
Just a word on speaker adaptation stuffs, you can actually use SphinxTrain's mllr_solve and mllr_transform to learn and transform a model off-line. Unfortunately, I wasn't able to write some very detail documentation about that part yet. I will keep this list informed if I do.
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2005-01-06
Arthur -- please make that "when", not "if".
cheers,
jerry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sure. ;-), So I will keep this list inform when I do so.
This is a task I have already listed as one of the goal in first half of 2005. I will not say this is a critical path of our plan though. Coz funded project is always burning us.
-Arthur
P.S. Or did I make a grammar mistake? I always think my English needs to improve more. (I come from Hong Kong) Send me a mail to correct me in this case. :-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I read in one paper about Sphinx-4 ("Sphinx-4: A flexible Open Source Framework for Speech Recognition") that the Sphinx team is planning to create FrontEnd and AcousticModel implementations that support the models generated by HTK.
As you probably already have HTK format models, you could probably wait for this functionality to be implemented?
Question to the Sphinx developers: Are you serious about this HTK plugin plans? I (and many others as well), who spent a whole lot of work get good models with HTK) would greatly appreciate such a plugin!
Is it a big thing to add this, and when do you expect to start this sub-project?
I would be grateful to hear your commitment to this experiment :-)
Reto
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We actually haven't started this sub-project, and may not be able to get to it for quite some time. It should, however, hopefully be straightforward. If someone who was intimate with HTK would like to help out, I'd definitely be happy to provide direction and help for where this support would fit in with Sphinx-4.
The basic idea is that you'd need to create a loader for the HTK acoustic models (similar to the one in edu/cmu/sphinx/linguist/acoustic/tiedstate/Sphinx3Loader.java) and also modify the front end to produce features in line with the ones that were used to train HTK.
Will
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
we -- that is a small project at the Institute of Phonetics in Munich -- are currently testing whether we can use our HTK models with Sphinx-4.
We have models for German that are based on the Verbmobil Corpora of the BAS (Bavarian Archive for Speechsignals).
At the moment, we have extracted a very small set of models which we try to run through a small application (analogue to the demo app "Transcriber" which comes with S4). After some exceptions on the way, we now have a complete run but we still have to refine the model set to be able to compare the S4 result to HTK's.
If we are successfull we can post our approach (here or some fitting place, I suppose). If someone from the Sphinx-4 team would like to act as advisor we would gladly appreciate that help! (Please e-mail me directly.)
Hi,
Does the Sphinx4 recognizer come with acoustic models? Or do we have to develop them ? I had worked with the HTK, and know that building acoustic models is a painful procedure if you have to start from collecting voice samples. Does anybody know if there are any freely available corpora or acoustic models on the net? I will be grasteful to any help.
-Jaideep
Hi Jaideep:
Sphinx-4 does indeed come with acoustic models. In the CVS repository, there are models for tidigits, WSJ (16kHz and 8kHz). You can also grab RM1 and HUB-4 models from http://sourceforge.net/project/showfiles.php?group_id=1904&package_id=117949.
In addition, Sphinx-4 can read models created from SphinxTrain, so you if you know how to use SphinxTrain, you can create your own models.
Hope this helps,
Will
Thanks Will,
Thanks for the links. I have downloaded those files. I will to use those models. I have one more doubt though; my application needs tuning acoustic models for Indian English. Is it possible to use these models and then customize it for the end user? or will I have to develop new models specially for Indian English altogether? Please help me out in this matter. Thanks for the links once again.
-Jaideep
Hi Jaideep:
(I'm coming back from a longish break)
This past summer, Arthur Chan did some work on SphinxTrain with respect to regression matrices for acoustic models. The idea is that one can use training data from perhaps one person to create data that can be used to skew/adapt an existing acoustic model to the new training data.
While it has latent support to handle these matrices, Sphinx-4 currently doesn't have the ability to read this data in. Adding this support should be pretty straightforward, and I'm hoping someone might step up and help us with this.
I'm not sure, however, if creating adaptation models for Indian English and applying them to acoustic models for native English speakers would work well. For example, "Indian English" tends to switch the pronunciation of v's and w's, which seems to be quite a significant change. Our speech experts would be better able to advise in this space, though.
Happy New Year,
Will
HI Jaideep,
Just a word on speaker adaptation stuffs, you can actually use SphinxTrain's mllr_solve and mllr_transform to learn and transform a model off-line. Unfortunately, I wasn't able to write some very detail documentation about that part yet. I will keep this list informed if I do.
Arthur
Arthur -- please make that "when", not "if".
cheers,
jerry
Sure. ;-), So I will keep this list inform when I do so.
This is a task I have already listed as one of the goal in first half of 2005. I will not say this is a critical path of our plan though. Coz funded project is always burning us.
-Arthur
P.S. Or did I make a grammar mistake? I always think my English needs to improve more. (I come from Hong Kong) Send me a mail to correct me in this case. :-)
I read in one paper about Sphinx-4 ("Sphinx-4: A flexible Open Source Framework for Speech Recognition") that the Sphinx team is planning to create FrontEnd and AcousticModel implementations that support the models generated by HTK.
As you probably already have HTK format models, you could probably wait for this functionality to be implemented?
Question to the Sphinx developers: Are you serious about this HTK plugin plans? I (and many others as well), who spent a whole lot of work get good models with HTK) would greatly appreciate such a plugin!
Is it a big thing to add this, and when do you expect to start this sub-project?
I would be grateful to hear your commitment to this experiment :-)
Reto
Hi Reto:
We actually haven't started this sub-project, and may not be able to get to it for quite some time. It should, however, hopefully be straightforward. If someone who was intimate with HTK would like to help out, I'd definitely be happy to provide direction and help for where this support would fit in with Sphinx-4.
The basic idea is that you'd need to create a loader for the HTK acoustic models (similar to the one in edu/cmu/sphinx/linguist/acoustic/tiedstate/Sphinx3Loader.java) and also modify the front end to produce features in line with the ones that were used to train HTK.
Will
Hello all,
we -- that is a small project at the Institute of Phonetics in Munich -- are currently testing whether we can use our HTK models with Sphinx-4.
We have models for German that are based on the Verbmobil Corpora of the BAS (Bavarian Archive for Speechsignals).
At the moment, we have extracted a very small set of models which we try to run through a small application (analogue to the demo app "Transcriber" which comes with S4). After some exceptions on the way, we now have a complete run but we still have to refine the model set to be able to compare the S4 result to HTK's.
If we are successfull we can post our approach (here or some fitting place, I suppose). If someone from the Sphinx-4 team would like to act as advisor we would gladly appreciate that help! (Please e-mail me directly.)
Best Regards,
Chantal
Chantal Ackermann M.A.
Institut fr Phonetik und Sprachliche Kommunikation
Schellingstrae 3
80799 Mnchen
(0 89) 21 80-57 53
www.phonetik.uni-muenchen.de/~chantal
chantal (at) phonetik.uni-muenchen.de