Is training via SphinxTrain done only to adapt a model to one speakers voice and one communication channel? This would be analogous to Nuance's Dragon being initially calibrated to the microphone being used and then to the voice of one person when they read back some calibration text.
Now, I've come back to looking at using the PocketSphinx that comes bundled with FreeSwitch in a telephone call answering ASR type application but I'm not sure if SphinxTrain is being used or should be used when their are many different voices and channels as in many telephone calls into this switch. If it can or should be used for training in this situation is this training constantly occurring with use or in this FreeSwitch case does it need to be "turned on." If one needs to turn it on for automatic training then how does one go about doing this?
Thanks. Mark.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> Is training via SphinxTrain done only to adapt a model to one speakers voice and one communication channel?
Yes, it's possible to adapt with SphinxTrain, but it's not the only function of this package :) If you have accounts on your PBX, then probably you can keep per-user database of adapted models. Here you can find the description of the process:
> If it can or should be used for training in this situation is this training constantly occurring with use or in this FreeSwitch case does it need to be "turned on." If one needs to turn it on for automatic training then how does one go about doing this?
It can be useful, but currently freeswitch works only with basic functions. Advanced things will require significant amounts of coding. Btw, it's possible to use online adaptation methods that work without training as well. VLTN warp factor estimation is one of such methods.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK, so are you saying that not so advanced things like VTLN warp factor estimation are already built in (except maube for the recent pitched based work like Arlo Faria's) and this works as the system is used to improve accuracy. If this already in PocketSphinx then is there a way to check if it's "on" and doing this "learning" procedure or is that just how it works by default. If it's not "on" how do I turn it "on". Also, Freeswitch has patched things so one can't get logging from PocketSphinx and I'd have to look at the code to see if a flag is set.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Did David H. implement a pitch-based version of VTLN warp factor estimation to what Arlo Faria describes or is the more classic slower and resource intensive approaches?
BTW are you at CMU?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> Did David H. implement a pitch-based version of VTLN warp factor estimation to what Arlo Faria describes or is the more classic slower and resource intensive approaches?
No, as you see in the mail quoted above it's suggested to decode chunks with a different factor which is quote bad programming wise. As far as I see only pitch estimation with YIN is implemented in sphinxbase, see sphinx_pitch.
> BTW are you at CMU?
No, I live in Moscow, Russia.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi David H. and Nickolay.
Is training via SphinxTrain done only to adapt a model to one speakers voice and one communication channel? This would be analogous to Nuance's Dragon being initially calibrated to the microphone being used and then to the voice of one person when they read back some calibration text.
Now, I've come back to looking at using the PocketSphinx that comes bundled with FreeSwitch in a telephone call answering ASR type application but I'm not sure if SphinxTrain is being used or should be used when their are many different voices and channels as in many telephone calls into this switch. If it can or should be used for training in this situation is this training constantly occurring with use or in this FreeSwitch case does it need to be "turned on." If one needs to turn it on for automatic training then how does one go about doing this?
Thanks. Mark.
> Is training via SphinxTrain done only to adapt a model to one speakers voice and one communication channel?
Yes, it's possible to adapt with SphinxTrain, but it's not the only function of this package :) If you have accounts on your PBX, then probably you can keep per-user database of adapted models. Here you can find the description of the process:
http://www.speech.cs.cmu.edu/cmusphinx/moinmoin/AcousticModelAdaptation
> If it can or should be used for training in this situation is this training constantly occurring with use or in this FreeSwitch case does it need to be "turned on." If one needs to turn it on for automatic training then how does one go about doing this?
It can be useful, but currently freeswitch works only with basic functions. Advanced things will require significant amounts of coding. Btw, it's possible to use online adaptation methods that work without training as well. VLTN warp factor estimation is one of such methods.
OK, so are you saying that not so advanced things like VTLN warp factor estimation are already built in (except maube for the recent pitched based work like Arlo Faria's) and this works as the system is used to improve accuracy. If this already in PocketSphinx then is there a way to check if it's "on" and doing this "learning" procedure or is that just how it works by default. If it's not "on" how do I turn it "on". Also, Freeswitch has patched things so one can't get logging from PocketSphinx and I'd have to look at the code to see if a flag is set.
You'll need special VTLN models and a little decoder modification, see https://sourceforge.net/mailarchive/forum.php?thread_name=48910B11.5090801%40cs.cmu.edu&forum_name=cmusphinx-sdmeet
Thanks Nickolay for the reference.
Did David H. implement a pitch-based version of VTLN warp factor estimation to what Arlo Faria describes or is the more classic slower and resource intensive approaches?
BTW are you at CMU?
> Did David H. implement a pitch-based version of VTLN warp factor estimation to what Arlo Faria describes or is the more classic slower and resource intensive approaches?
No, as you see in the mail quoted above it's suggested to decode chunks with a different factor which is quote bad programming wise. As far as I see only pitch estimation with YIN is implemented in sphinxbase, see sphinx_pitch.
> BTW are you at CMU?
No, I live in Moscow, Russia.