Hi,
I'm using LibriSpeech acoustic modesl, with my bigram language model.
After looking at the lattice file, I can notice some negative numbers in the lm-score, i.e:
82 88 LESS -3.68641,1481.89,17_1568_1567_1567_1567_1570_1572_1571_1502_1501_1501_1501_1501_1501_1501_1501_1501_1504_1503_1503_1503_1506_1505_1505_1505_2162_2164_2166_2165_2165_188_187_187_187_187_187_187_187_190_189
Is it possible? isn't it negative log-prob?
If not, what could be the problem?
Thanks !
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It only makes sense if you look at the weight of the whole path -- WFST can distribute weights along the path arbitrarily, and weight of individual arc may not make sense.
Guoguo
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I guess what I was trying to say was: don't interpret individual arc weights as negated log probabilities. If you need posteriors of certain arc, you have to do forward-backward on the lattice.
Consider the following two WFST examples in log semiring:
WFST1:
0 1 1 1 -1
1 2 2 2 1
2
WFST2:
0 1 1 1 0
1 2 2 2 0
2
They are equivalent in the sense that the total weights of the path (in this a single path) are the same. So you can actually create equivalent WFSTs with different weights on each arc, and that's why I said WFST can distribute weights along the path arbitrarily.
Guoguo
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It is not necessary, it is more like a byproduct -- the WFST algorithms will have to manipulate arc weights, and they don't guarantee the weight on a single arc is negated logprob.
Guoguo
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm using LibriSpeech acoustic modesl, with my bigram language model.
After looking at the lattice file, I can notice some negative numbers in the lm-score, i.e:
82 88 LESS -3.68641,1481.89,17_1568_1567_1567_1567_1570_1572_1571_1502_1501_1501_1501_1501_1501_1501_1501_1501_1504_1503_1503_1503_1506_1505_1505_1505_2162_2164_2166_2165_2165_188_187_187_187_187_187_187_187_190_189
Is it possible? isn't it negative log-prob?
If not, what could be the problem?
Thanks !
It only makes sense if you look at the weight of the whole path -- WFST can distribute weights along the path arbitrarily, and weight of individual arc may not make sense.
Guoguo
Thanks for replying, but can you more specified,
in which cases arcs set to be negative?
it is very strange to me, because negative weights means probability larger than 1.
I guess what I was trying to say was: don't interpret individual arc weights as negated log probabilities. If you need posteriors of certain arc, you have to do forward-backward on the lattice.
Consider the following two WFST examples in log semiring:
WFST1:
0 1 1 1 -1
1 2 2 2 1
2
WFST2:
0 1 1 1 0
1 2 2 2 0
2
They are equivalent in the sense that the total weights of the path (in this a single path) are the same. So you can actually create equivalent WFSTs with different weights on each arc, and that's why I said WFST can distribute weights along the path arbitrarily.
Guoguo
Can you please specify where "break an arc into positive and negative weight" is necessary during speech recognition software process?
It is not necessary, it is more like a byproduct -- the WFST algorithms will have to manipulate arc weights, and they don't guarantee the weight on a single arc is negated logprob.
Guoguo