Hi everyone,
we are an University research group and we use SphinxTrain to try to build a custom acoustic model. When launching the RunAll.pl script, everything goes fine, but this of course works when creating a model from scratch.
Now we would like to do some variations to the "black-box" data flow performed by such script, specifically for the following purposes:
- Growing an existing model with new training data.
- Speaker adaptation of an existing model.
- (more trivial) Multi-threading of "bw" invocations when split in multiple parts (-part and -npart options) to take advantage of multi-core processors.
We read all the official documentation and we know these problems are all answered from a theoretical point of view. Moreover, one can freely modify the scripts and programs.
The point is, the set of perl scripts and C programs is quite big and complex, and so it's not easy to work on it, even after deep study, especially because one can introduce some bugs, or not fully understand the dependencies among the various scripts, or miss to implement all the steps required by theory.
So, provided that such tasks should be essential, soon or later, to anyone using a recognizer, do some scripts or patches already exist that address the above problems?
Bye and thanks again for such a great program.
--
Ing. Michele Alessandrini
Funzionario tecnico
DIBET, Università Politecnica delle Marche
60131 Ancona (AN)
tel. 071 2204787
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> - (more trivial) Multi-threading of "bw" invocations when split in multiple parts (-part and -npart options) to take advantage of multi-core processors.
In configuration file
$CFG_QUEUE_TYPE = "Queue::POSIX";
$CFG_NPART = <number of jobs>;
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nickolay, thanks for your quick reply, as usual.
> > - Growing an existing model with new training data.
> Not practical from my point of view. It's easier to retrain. Though you can skip some
> initial steps of course
Yes, I do agree this is the correct procedure, as the official documentation states, but the problem I was highlighting is: how to break the tool chain of perl scripts to do a simple variation like "skip some initial steps"? The way each script calls each other is quite intricate, with several dependencies from script to script. The options to the executables, too, are many and not always documented. That's way I was wondering if someone already made a series of scripts to do that. (This is true for the second question too.)
> $CFG_QUEUE_TYPE = "Queue::POSIX";
Great! I didn't realize this variation could just do that. I'll try it immediately.
Bye and thans again
Michele
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As for the philosophy of the scripts and what should we have and what should we avoid I'd really prefer to have some task-oritented programs instead of swiss knife. For me it seems that most corner cases aren't so important as the clean implementation and if not documented but at least documentable process. Something like Mac or GNOME vs Windows and KDE. We definitely don't need to implement every possible algorithm in the world but only a few that are important. I hope we will not be another HTK.
But that's my personal preference.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As a better clarification of my concerns, I'd like to point out the speaker adaptation problem. I read the documentation you are citing, basically:
1) you use "bw" witn inputs both an existing speaker-independent model and a speaker-dependent training corpus. [One iteration only?]
2) you use "mllr_solve" to get a mllr transformation matrix
But, as you said in other posts, sphinx4 does not support mllr. You suggest to do off-line model adaptation, that is, from what I can understand, creating a new model from an existing one, the new one being speaker-adapted.
But here comes another perplexity: how? I can see from "bw" parameters that you can use it to do that, by giving it the mllr matrix by means of one of its parameters. But the behaviour of these parameters is not documented very well, and the source code is quite huge (merit to the developers, of course!).
That's why some already-made scripts, if they exist, would be very valuable.
Bye and thanks again
Michele
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi everyone,
we are an University research group and we use SphinxTrain to try to build a custom acoustic model. When launching the RunAll.pl script, everything goes fine, but this of course works when creating a model from scratch.
Now we would like to do some variations to the "black-box" data flow performed by such script, specifically for the following purposes:
- Growing an existing model with new training data.
- Speaker adaptation of an existing model.
- (more trivial) Multi-threading of "bw" invocations when split in multiple parts (-part and -npart options) to take advantage of multi-core processors.
We read all the official documentation and we know these problems are all answered from a theoretical point of view. Moreover, one can freely modify the scripts and programs.
The point is, the set of perl scripts and C programs is quite big and complex, and so it's not easy to work on it, even after deep study, especially because one can introduce some bugs, or not fully understand the dependencies among the various scripts, or miss to implement all the steps required by theory.
So, provided that such tasks should be essential, soon or later, to anyone using a recognizer, do some scripts or patches already exist that address the above problems?
Bye and thanks again for such a great program.
--
Ing. Michele Alessandrini
Funzionario tecnico
DIBET, Università Politecnica delle Marche
60131 Ancona (AN)
tel. 071 2204787
To be honest I don't see the problem here
> - Growing an existing model with new training data.
Not practical from my point of view. It's easier to retrain. Though you can skip some initial steps of course
> - Speaker adaptation of an existing model.
Just a three commands
http://www.speech.cs.cmu.edu/cmusphinx/moinmoin/AcousticModelAdaptation
> - (more trivial) Multi-threading of "bw" invocations when split in multiple parts (-part and -npart options) to take advantage of multi-core processors.
In configuration file
$CFG_QUEUE_TYPE = "Queue::POSIX";
$CFG_NPART = <number of jobs>;
Hi Nickolay, thanks for your quick reply, as usual.
> > - Growing an existing model with new training data.
> Not practical from my point of view. It's easier to retrain. Though you can skip some
> initial steps of course
Yes, I do agree this is the correct procedure, as the official documentation states, but the problem I was highlighting is: how to break the tool chain of perl scripts to do a simple variation like "skip some initial steps"? The way each script calls each other is quite intricate, with several dependencies from script to script. The options to the executables, too, are many and not always documented. That's way I was wondering if someone already made a series of scripts to do that. (This is true for the second question too.)
> $CFG_QUEUE_TYPE = "Queue::POSIX";
Great! I didn't realize this variation could just do that. I'll try it immediately.
Bye and thans again
Michele
I've got your point. Well, we have what we have. Any additions you think are useful are certainly welcome.
Btw, about adaptation you probably would be interested to read my recent blog post, it has some important bits not mentioned on the wiki:
http://nshmyrev.blogspot.com/2009/09/adaptation-methods.html
As for the philosophy of the scripts and what should we have and what should we avoid I'd really prefer to have some task-oritented programs instead of swiss knife. For me it seems that most corner cases aren't so important as the clean implementation and if not documented but at least documentable process. Something like Mac or GNOME vs Windows and KDE. We definitely don't need to implement every possible algorithm in the world but only a few that are important. I hope we will not be another HTK.
But that's my personal preference.
As a better clarification of my concerns, I'd like to point out the speaker adaptation problem. I read the documentation you are citing, basically:
1) you use "bw" witn inputs both an existing speaker-independent model and a speaker-dependent training corpus. [One iteration only?]
2) you use "mllr_solve" to get a mllr transformation matrix
But, as you said in other posts, sphinx4 does not support mllr. You suggest to do off-line model adaptation, that is, from what I can understand, creating a new model from an existing one, the new one being speaker-adapted.
But here comes another perplexity: how? I can see from "bw" parameters that you can use it to do that, by giving it the mllr matrix by means of one of its parameters. But the behaviour of these parameters is not documented very well, and the source code is quite huge (merit to the developers, of course!).
That's why some already-made scripts, if they exist, would be very valuable.
Bye and thanks again
Michele