In order to determine a good value for the word insertion penalty (WIP), I would like to know how Sphinx 3 defines the WIP.
I have trained and tested models in HTK, ISIP and Julius, and the grammar scale factor (GSF) seems to correspond with the usual values that are mentioned in the documentation (http://cmusphinx.sourceforge.net/sphinx3/doc/s3_description.html#lm_lw_wip), but the WIP seems to have very different values. I've seen WIP values in HTK from -20 to 20, but the Sphinx documentation says that the value usually lies between 0.2 and 0.7.
Now, from what I can gather, the formula to determine the most likely sentence W^ is defined like this in HTK, ISIP and Julius*:
Can anyone tell me how the WIP is defined in Sphinx 3 or where to locate the formula in the source code?
(Note: my random ideas on why the value might be different:
- I read a paper where not wip was added, but m * wip, where m was the length of W.
- Perhaps HTK_WIP = 1/S3_WIP?
)
I'd appreciate any help you can give me!
Best regards,
Wout
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You provided this formula:
W^ = argmax[W] log(p(X|W)) + gsf * log(p(W)) + wip
In order for this to be a proper (log) probability distribution (ignoring gsf), would require that wip be between -infinty and 0 (that is, b/t log(0) and log(1)). That makes HTK's value of [-20,20] kind of odd, or ad-hoc.
I think that Sphinx's WIP is indeed a probability, i.e. b/t 0 and 1. If you exponentiate the numbers from other packages (taking care to use the correct base), you should have a comparable numbers, if my assumptions have been correct.
Regards,
Robbie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> That makes HTK's value of [-20,20] kind of odd, or ad-hoc.
Just to make myself clear: -20 and 20 are just examples, I don't believe there are bounds on the WIP in HTK. From what I've seen, values between -20 and 20 are common.
> I think that Sphinx's WIP is indeed a probability, i.e. b/t 0 and 1. If you exponentiate the numbers from other packages (taking care to use the correct base), you should have a comparable numbers, if my assumptions have been correct.
I've been looking at the S3 code, and I believe the WIP you specify is just a factor with which the probability is multiplied. Internally, the WIP is immediately converted to log1.0001, so in practice, log1.0001 is added. Additionally, only the lower bound (0.0) of the WIP is checked. If the WIP were a probability, it would make sense to also check the upper bound.
Now, I'm just wondering if the WIP is multiplied by the number or words or not. The comments of the functions log_hypseg(), s3dag_log_hypseg() and main() mention that this is output:
> scr = ascr + (lscrlw+Nwip), where N = #words excluding <s>
but I can't find such a multiplication anywhere in the code. Based on the function lm_rawscore(), that calculates the original probability based on a LM and a probability to which the WIP and LM weight have been applied, I would guess that the WIP is not multiplied by the number of words...
Can anyone shed some light on this?
Thanks in advance!
Best regards,
Wout
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
>Now, I'm just wondering if the WIP is multiplied by the number or words or not. The comments of the >functions log_hypseg(), s3dag_log_hypseg() and main() mention that this is output:
>> scr = ascr + (lscrlw+Nwip), where N = #words excluding <s>
>but I can't find such a multiplication anywhere in the code. Based on the function lm_rawscore(), that >calculates the original probability based on a LM and a probability to which the WIP and LM weight have >been applied, I would guess that the WIP is not multiplied by the number of words...
It's implicitly multiplied by the number of words, because it is added in for each word. The LM score for any word sequence is (feed this to LaTeX to get a nice equation...):
There is an error in the comments above since the language weight is also applied to the WIP. (the log-WIP is simply added to all of the log probabilities in the language model when it is loaded - see lm_set_param() in src/libs3decoder/liblm/lm.c)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> It's implicitly multiplied by the number of words, because it is added in for each word.
Yes, I realized that after I posted my message :-S.
> There is an error in the comments above since the language weight is also applied to the WIP.
I don't think that the LM weight is applied to the WIP anymore. The comments state in a number of places that this was removed (on 30-Dec-2000 by Rita Singh). In lm_set_param():
iwip = logs3(wip) * lw;
was replaced by:
iwip = logs3(wip);
for example.
I must say that I don't know much about the workings of Sphinx 3, so please correct me if I'm wrong.
Best regards,
Wout Maaskant
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
After looking at the source code of HTK, ISIP and Sphinx 3, mailing to the HTK mailing list and posting here, I belive that the WIP values you specify as parameters for HTK/ISIP (these the same) and Sphinx 3 can be converted as follows:
> I don't think that the LM weight is applied to the WIP anymore. The comments state in a number of places > that this was removed (on 30-Dec-2000 by Rita Singh). In lm_set_param():
> iwip = logs3(wip) * lw;
> was replaced by:
> iwip = logs3(wip);
> for example.
Ah, yes, you're right, thanks for clearing that up. I was a bit worried for a second!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Also just to straighten this all out, the conversions for command-line arguments between HVite/HDecode and Sphinx3 are simple. The Sphinx values for beams and insertion penalty are just exp(HTKvalue). While the value for language weight is the same. So the following HVite command line:
HVite -t 350.0 -p -4.0 -s 15.0
Is equivalent to this one in Sphinx:
sphinx3_decode -beam 1e-152 -wip 0.0183 -lw 15.0
Except of course that sphinx3_decode has a lot more beam settings to play with.
The Python modules for reading HTK feature files and writing Sphinx feature files are in SphinxTrain's subversion repository now (python/sphinx/htkmfc.py, python/sphinx/s2mfc.py). You can implement a converter in 2 lines of code so I haven't bothered to add one.
Supporting HTK feature files directly is a near-term goal for Sphinx, because they are generally a better file format. Also HCopy's feature extraction is better than Sphinx's at the moment.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Great that you've found out how to convert the HTK beam pruning parameter -t to S3!
On your wiki you write:
> So, the following parameters for HTK:
> HVite -t 350.0 -p -4.0 -s 15.0
> Are equivalent to these in Sphinx 3.0 (although I'm not sure about -nwbeam):
> s3decode-anytopo -beam 1e-152 -nwbeam 1e-152 -inspen 0.0183 -langwt 15.0
The S2 documentation describes "-nwbeam" as "Word-exit beam width for tree search." in the alphabetical list of arguments. I think this parameter may be related (possibly a log/linear conversion again) to the "-v" parameter of HTK.
Hope this helps.
Because of time pressure and because I'm not certain I can get the HTK->S3 model conversion working (quickly), I've decided to use S3 to train new models. At least that is well-documented, so the chance of success is a bit higher :-). If my employer allows, I'll post the converter online so other people can work with/on it.
Best regards,
Wout
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Why did you use Sphinx 3.0 instead of 3.6 for your comparison? I thought that the two Sphinx 3 branches (fast & slow) were merged in version 3.5. I am about to start to train Sphinx 3 in order to compare it to HTK/Julius/ISIP, and I was planning to use 3.6. Do you recommend using 3.0 instead (if so, why)?
Oh, something about the conversion of the WIP you mentioned:
> The Sphinx values for beams and insertion penalty are just exp(HTKvalue).
If Sphinx 3.0, like 3.6, uses logbase 1.0001 (I haven't checked 3.0 out), I think you should take that log base into account:
wip_s3 = 1.0001^(e^wip_htk)
The value won't be very different, but it's nice to be precise ;-).
Best regards,
Wout
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
In order to determine a good value for the word insertion penalty (WIP), I would like to know how Sphinx 3 defines the WIP.
I have trained and tested models in HTK, ISIP and Julius, and the grammar scale factor (GSF) seems to correspond with the usual values that are mentioned in the documentation (http://cmusphinx.sourceforge.net/sphinx3/doc/s3_description.html#lm_lw_wip), but the WIP seems to have very different values. I've seen WIP values in HTK from -20 to 20, but the Sphinx documentation says that the value usually lies between 0.2 and 0.7.
Now, from what I can gather, the formula to determine the most likely sentence W^ is defined like this in HTK, ISIP and Julius*:
W^ = argmax[W] log(p(X|W)) + gsf * log(p(W)) + wip
Can anyone tell me how the WIP is defined in Sphinx 3 or where to locate the formula in the source code?
(Note: my random ideas on why the value might be different:
- I read a paper where not wip was added, but m * wip, where m was the length of W.
- Perhaps HTK_WIP = 1/S3_WIP?
)
I'd appreciate any help you can give me!
Best regards,
Wout
I'm no expert, but here are my thoughts:
You provided this formula:
W^ = argmax[W] log(p(X|W)) + gsf * log(p(W)) + wip
In order for this to be a proper (log) probability distribution (ignoring gsf), would require that wip be between -infinty and 0 (that is, b/t log(0) and log(1)). That makes HTK's value of [-20,20] kind of odd, or ad-hoc.
I think that Sphinx's WIP is indeed a probability, i.e. b/t 0 and 1. If you exponentiate the numbers from other packages (taking care to use the correct base), you should have a comparable numbers, if my assumptions have been correct.
Regards,
Robbie
Thanks for your reply, Robbie!
> That makes HTK's value of [-20,20] kind of odd, or ad-hoc.
Just to make myself clear: -20 and 20 are just examples, I don't believe there are bounds on the WIP in HTK. From what I've seen, values between -20 and 20 are common.
> I think that Sphinx's WIP is indeed a probability, i.e. b/t 0 and 1. If you exponentiate the numbers from other packages (taking care to use the correct base), you should have a comparable numbers, if my assumptions have been correct.
I've been looking at the S3 code, and I believe the WIP you specify is just a factor with which the probability is multiplied. Internally, the WIP is immediately converted to log1.0001, so in practice, log1.0001 is added. Additionally, only the lower bound (0.0) of the WIP is checked. If the WIP were a probability, it would make sense to also check the upper bound.
Now, I'm just wondering if the WIP is multiplied by the number or words or not. The comments of the functions log_hypseg(), s3dag_log_hypseg() and main() mention that this is output:
> scr = ascr + (lscrlw+Nwip), where N = #words excluding <s>
but I can't find such a multiplication anywhere in the code. Based on the function lm_rawscore(), that calculates the original probability based on a LM and a probability to which the WIP and LM weight have been applied, I would guess that the WIP is not multiplied by the number of words...
Can anyone shed some light on this?
Thanks in advance!
Best regards,
Wout
>Now, I'm just wondering if the WIP is multiplied by the number or words or not. The comments of the >functions log_hypseg(), s3dag_log_hypseg() and main() mention that this is output:
>> scr = ascr + (lscrlw+Nwip), where N = #words excluding <s>
>but I can't find such a multiplication anywhere in the code. Based on the function lm_rawscore(), that >calculates the original probability based on a LM and a probability to which the WIP and LM weight have >been applied, I would guess that the WIP is not multiplied by the number of words...
It's implicitly multiplied by the number of words, because it is added in for each word. The LM score for any word sequence is (feed this to LaTeX to get a nice equation...):
lscr = sum_{i=1}^N \log P(w_i) + \log wip = N \log wip + sum_{i=1}^N \log P(w_i)
There is an error in the comments above since the language weight is also applied to the WIP. (the log-WIP is simply added to all of the log probabilities in the language model when it is loaded - see lm_set_param() in src/libs3decoder/liblm/lm.c)
> It's implicitly multiplied by the number of words, because it is added in for each word.
Yes, I realized that after I posted my message :-S.
> There is an error in the comments above since the language weight is also applied to the WIP.
I don't think that the LM weight is applied to the WIP anymore. The comments state in a number of places that this was removed (on 30-Dec-2000 by Rita Singh). In lm_set_param():
iwip = logs3(wip) * lw;
was replaced by:
iwip = logs3(wip);
for example.
I must say that I don't know much about the workings of Sphinx 3, so please correct me if I'm wrong.
Best regards,
Wout Maaskant
After looking at the source code of HTK, ISIP and Sphinx 3, mailing to the HTK mailing list and posting here, I belive that the WIP values you specify as parameters for HTK/ISIP (these the same) and Sphinx 3 can be converted as follows:
wip_s3 = 1.0001^(e^wip_htk)
(where e is the base of the natural logarithm)
and the inverse:
wip_htk = loge
Best regards,
Wout
> I don't think that the LM weight is applied to the WIP anymore. The comments state in a number of places > that this was removed (on 30-Dec-2000 by Rita Singh). In lm_set_param():
> iwip = logs3(wip) * lw;
> was replaced by:
> iwip = logs3(wip);
> for example.
Ah, yes, you're right, thanks for clearing that up. I was a bit worried for a second!
Also just to straighten this all out, the conversions for command-line arguments between HVite/HDecode and Sphinx3 are simple. The Sphinx values for beams and insertion penalty are just exp(HTKvalue). While the value for language weight is the same. So the following HVite command line:
HVite -t 350.0 -p -4.0 -s 15.0
Is equivalent to this one in Sphinx:
sphinx3_decode -beam 1e-152 -wip 0.0183 -lw 15.0
Except of course that sphinx3_decode has a lot more beam settings to play with.
I've been doing some Sphinx/HTK comparisons of my own, you may wish to look at http://lima.lti.cs.cmu.edu/mediawiki/index.php/SphinxHTK
The Python modules for reading HTK feature files and writing Sphinx feature files are in SphinxTrain's subversion repository now (python/sphinx/htkmfc.py, python/sphinx/s2mfc.py). You can implement a converter in 2 lines of code so I haven't bothered to add one.
Supporting HTK feature files directly is a near-term goal for Sphinx, because they are generally a better file format. Also HCopy's feature extraction is better than Sphinx's at the moment.
Hi David,
Great that you've found out how to convert the HTK beam pruning parameter -t to S3!
On your wiki you write:
> So, the following parameters for HTK:
> HVite -t 350.0 -p -4.0 -s 15.0
> Are equivalent to these in Sphinx 3.0 (although I'm not sure about -nwbeam):
> s3decode-anytopo -beam 1e-152 -nwbeam 1e-152 -inspen 0.0183 -langwt 15.0
A few points:
The Sphinx 3 version that I downloaded (3.6) doesn't have a nwbeam parameter. Isn't this a S2 parameter (see: http://cmusphinx.sourceforge.net/sphinx2/doc/sphinx2.html#sec_cmdline_beam)?
The S2 documentation describes "-nwbeam" as "Word-exit beam width for tree search." in the alphabetical list of arguments. I think this parameter may be related (possibly a log/linear conversion again) to the "-v" parameter of HTK.
Hope this helps.
Because of time pressure and because I'm not certain I can get the HTK->S3 model conversion working (quickly), I've decided to use S3 to train new models. At least that is well-documented, so the chance of success is a bit higher :-). If my employer allows, I'll post the converter online so other people can work with/on it.
Best regards,
Wout
Yes, these are parameters for the original Sphinx 3.0, which is a flat-lexicon (exact) decoder similar to HVite. You can get it from https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/archive_s3/s3
I'm going to be doing another round of testing with sphinx3 from Subversion shortly.
Why did you use Sphinx 3.0 instead of 3.6 for your comparison? I thought that the two Sphinx 3 branches (fast & slow) were merged in version 3.5. I am about to start to train Sphinx 3 in order to compare it to HTK/Julius/ISIP, and I was planning to use 3.6. Do you recommend using 3.0 instead (if so, why)?
Oh, something about the conversion of the WIP you mentioned:
> The Sphinx values for beams and insertion penalty are just exp(HTKvalue).
If Sphinx 3.0, like 3.6, uses logbase 1.0001 (I haven't checked 3.0 out), I think you should take that log base into account:
wip_s3 = 1.0001^(e^wip_htk)
The value won't be very different, but it's nice to be precise ;-).
Best regards,
Wout