An odd beahviour is being observed in the function
fsg_search_pnode_exit(fsg_search_t *fsgs, fsg_pnode_t * pnode);
In this function a word is checked for being a filler or a single phone word.
if (fsg_model_is_filler(fsgs->fsg, wid)
/ FIXME: This might be slow due to repeated calls to dict_to_id(). /
|| (dict_is_single_phone(ps_search_dict(fsgs),
dict_wordid(ps_search_dict(fsgs),
fsg_model_word_str(fsgs->fsg, wid)))))
{
....
}
It is observed that many alternate pronciation words are also being flagged as
filler word.
On further probing the code there appeared to be some issue with fsg->silwords
vector.
I am using hub4wsj_sc_8k model. My dictionary has 175 words, 146 unique words,
9 sil / filler words
and remaining alt_pron words. So total 175 words
hence fsg->silwords vector should have 6 words. On printing the fsg->silwords
vector, immediately
after fsg initialization the following values were observed
My hunch is that the last word is not being initialized properly. (It being
0xcdcdcdcd).
If the observation is correct then this anomaly might be affecting the
accuracy also?
Regards
Pankaj
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I see that this bugfix involves changes in the file bitvec.h bitvec.c and
fsg_model.c in sphinxbase. Does it involves changes in any other file? For
some reasons I need to incorporate this change in an application based on
previous release 0.6.1 and hence this question.
Regards
Pankaj
Regards
Pankaj
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I tested the changed code.fsg->altwords vectors seems to be ok, but there is
still something unexplainable with fsg->silwords. After taking a dump of
silwords for a dictionary of 175 words.
Following was the the dump:
0x00000000
0x00000000
0x00000000
0x00000000
0x07fc0000
0x00008000
I am unable to understand why the last word is 0x00008000 corresponding to
176th word when there are only 175 words in the dictionary.
Regards
Pankaj
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry, I have no idea how this bit was raised. It might be set somewhere on
the way. It should be easy to debug, you can just add a watchpoint after
realloc and catch the moment the memory is modified.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nicole,
An odd beahviour is being observed in the function
fsg_search_pnode_exit(fsg_search_t *fsgs, fsg_pnode_t * pnode);
In this function a word is checked for being a filler or a single phone word.
if (fsg_model_is_filler(fsgs->fsg, wid)
/ FIXME: This might be slow due to repeated calls to dict_to_id(). /
|| (dict_is_single_phone(ps_search_dict(fsgs),
dict_wordid(ps_search_dict(fsgs),
fsg_model_word_str(fsgs->fsg, wid)))))
{
....
}
It is observed that many alternate pronciation words are also being flagged as
filler word.
On further probing the code there appeared to be some issue with fsg->silwords
vector.
I am using hub4wsj_sc_8k model. My dictionary has 175 words, 146 unique words,
9 sil / filler words
and remaining alt_pron words. So total 175 words
hence fsg->silwords vector should have 6 words. On printing the fsg->silwords
vector, immediately
after fsg initialization the following values were observed
0x00000000
0x00000000
0x00000000
0x00000000
0x07fc0000
0xcdcdcdcd
According to my understanding this vector should have been
0x00000000
0x00000000
0x00000000
0x00000000
0x07fc0000
0x00000000
My hunch is that the last word is not being initialized properly. (It being
0xcdcdcdcd).
If the observation is correct then this anomaly might be affecting the
accuracy also?
Regards
Pankaj
Indeed this is a bug in sphinxbase. Thanks a lot, this is a very good catch.
It's now fixed in sphinxbase trunk.
Hi,
I see that this bugfix involves changes in the file bitvec.h bitvec.c and
fsg_model.c in sphinxbase. Does it involves changes in any other file? For
some reasons I need to incorporate this change in an application based on
previous release 0.6.1 and hence this question.
Regards
Pankaj
Regards
Pankaj
No, that's it
Hi,
I tested the changed code.fsg->altwords vectors seems to be ok, but there is
still something unexplainable with fsg->silwords. After taking a dump of
silwords for a dictionary of 175 words.
Following was the the dump:
0x00000000
0x00000000
0x00000000
0x00000000
0x07fc0000
0x00008000
I am unable to understand why the last word is 0x00008000 corresponding to
176th word when there are only 175 words in the dictionary.
Regards
Pankaj
Hello
Sorry, I have no idea how this bit was raised. It might be set somewhere on
the way. It should be easy to debug, you can just add a watchpoint after
realloc and catch the moment the memory is modified.