So I've got two versions of pocketsphinx (and sphinxbase) compiled - one using fixed point, and one floating. The floating point version works great, and gives me basically 100% accuracy. The fixed point, however, doesn't. It doesn't recognize anything, at all, so I'm wondering what I'm doing wrong here. I tried playing with some of the input parameters (http://www.speech.cs.cmu.edu/cmusphinx/moinmoin/PocketsphinxHandhelds), with little luck.
I'm using wsj1 (the one that comes w/ the pocketsphinx SVN checkout), a custom (very small, like less than 20 words) LM and dictionary. I'm recording audio in Audacity, and I've tried 16bit 8khz and 16khz. The floating version can do both 8/16 khz and fixed can do neither.
Thoughts/Help?
M
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm assuming you mean print the values of the variable fea. They don't appear to change until after the third call (which is obvious, I guess). Here is what they look like:
So, we need to improve the level of granularity and compare values inside the function, do you get the idea :) ? There is nothing suspicious as far as I can see now, but since values are different bug must be somewhere inside.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ok, so it turns out that I traced the error down to the hamming window calculations. It turns out the FIXMUL_ANY function was different between the Windows and QNX versions. This is because _MSC_VER is defined in the windows, and not in QNX (obviously)... so I defined HAVE_LONG_LONG and SIZEOF_LONG_LONG 8, and things started working. Now my CMN.c lines are the same:
> Ok, so it turns out that I traced the error down to the hamming window calculations. It turns out the FIXMUL_ANY function was different between the Windows and QNX versions.
Great. Any ideas how to fix this properly? I suppose you don't use configure checks, then we probably need to create another set of project files for QNX or add some other compile-time check.
> So, ideas for the next place to look?
First of all please make sure you are using -dither no to avoid random noise in testing. Next, let's check the gau calculation if it has the same scores or not. Can you please add -backtrace yes option to output the acoustic scores.
Also, let's now compare the values in pocketsphinx/src/libpocketsphinx/s2_semi_mgau.c. Can you please print the sequence of mfcc values and the sequence of scores assigned to them. They should match as well. Also please check eval_topn. The output will be huge, but you need only a several first rows.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So here is the deal. I had success today in running pocketsphinx on my PPC 440, Virtex5 board.
Way earlier you mentioned "compiler flags"... to be specific, there are two flags which must be defined to get this beast working. The first is FIXED_POINT. The second is WORDS_BIGENDIAN.
In addition to this, there is a bug in tmat.c, around line 236, which looks like this:
if ((bio_fread(&(t->n_tmat), sizeof(int32), 1, fp, byteswap, &chksum) != 1)
t->n_tmat is an int16. This works fine on little endian, but not big endian. The hack to fix it (NOT A GOOD WAY) is this, inserted after the chained if/reads:
because the number in t->n_tmat is actually stored in t->n_state from the read above and 0 is always in t->n_state. I tried just reading two int16s instead, but got some checksum failure... you guys should really check this out.
Finally, another caveat is that you MUST put the following line in your argfile, assuming you are running batch mode:
-input_endian little
The program will change the default for input_endian based on what kind of machine you are (seriously), it is little on my x86, and big on the PPC.
Thanks for your help, I can't believe it was mostly because I was missing one #define. Ugh.
M
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There were similar problems in the past, mostly they were caused by some compilation flag issues.
Start to split the decoding on parts and cross check everything with working version. Print feature values, values after cmn, values read from the model, scores (-backtrace yes) and compare them
Try to build fixed point version on host with exactly the same flags to compare how it works.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Can you give me a little bit of help on how to do some of the things you mention above? I tried the -mfclogdir, which totally blew up my decoder that is working, so I assume this doesn't work... how about some of the other things?
The weird thing is that I compiled the pocketsphinx (& sphinxbase) code on win32 w/ FIXED_POINT defined, and it does work, perfectly. If I run the floating point version, it seems to give the same answers on win32 as it does on QNX, but obviously the fixed point does not. I looked through the compilation flags and didn't see anything that really jumped out at me, I included them for a file below, for grins. I realize I'm not doing the EXPORTS and DLL defines, because I just tossed the sphinxbase code in with pocketsphinx and compiled it all together on my QNX build... this would be undone later if I get it to work.
Also, do you have to link a different math library or something? I would image you need to, or not link it at all...
If you do not link libm, you get undefined values for pow, sqrt, floor, _Log, etc. So I'm a bit confused here... how is this a fixed point implementation if it is using the math library which assumes floating point, right?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I did a -mfclogdir dump and the mfc files were definitly not the same. I tried a -rawlogdir dump and those outputs were the same. Does that tell you anything? If not, is there some array, or variable I can capture that might tell us more?
M
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi -
So I've got two versions of pocketsphinx (and sphinxbase) compiled - one using fixed point, and one floating. The floating point version works great, and gives me basically 100% accuracy. The fixed point, however, doesn't. It doesn't recognize anything, at all, so I'm wondering what I'm doing wrong here. I tried playing with some of the input parameters (http://www.speech.cs.cmu.edu/cmusphinx/moinmoin/PocketsphinxHandhelds), with little luck.
I'm using wsj1 (the one that comes w/ the pocketsphinx SVN checkout), a custom (very small, like less than 20 words) LM and dictionary. I'm recording audio in Audacity, and I've tried 16bit 8khz and 16khz. The floating version can do both 8/16 khz and fixed can do neither.
Thoughts/Help?
M
I'm assuming you mean print the values of the variable fea. They don't appear to change until after the third call (which is obvious, I guess). Here is what they look like:
Windows(good)
0x00BCFBA8 ff f9 03 00
0x00BCFBAC 72 7a ff ff
0x00BCFBB0 6a e1 ff ff
0x00BCFBB4 19 0b 00 00
0x00BCFBB8 67 ff ff ff
0x00BCFBBC e0 02 00 00
0x00BCFBC0 74 10 00 00
0x00BCFBC4 20 0d 00 00
0x00BCFBC8 1a fd ff ff
0x00BCFBCC 4d ec ff ff
0x00BCFBD0 e9 f1 ff ff
0x00BCFBD4 1f 03 00 00
0x00BCFBD8 14 fd ff ff
QNX (bad)
fea : 0x83012A0 <Hex>
Address 0 1 2 3
083012A0 01 00 00 00
083012A4 9F EC C2 EB
083012A8 9F EC C2 EB
083012AC 9F EC C2 EB
083012B0 A1 EC C2 EB
083012B4 9F EC C2 EB
083012B8 9F EC C2 EB
083012BC 9F EC C2 EB
083012C0 A1 EC C2 EB
083012C4 9F EC C2 EB
083012C8 A0 EC C2 EB
083012CC 9F EC C2 EB
083012D0 9F EC C2 EB
So, I guess obviously the second one is not good, just not sure how this helps me get where I need to be...
M
So, we need to improve the level of granularity and compare values inside the function, do you get the idea :) ? There is nothing suspicious as far as I can see now, but since values are different bug must be somewhere inside.
Ok, so it turns out that I traced the error down to the hamming window calculations. It turns out the FIXMUL_ANY function was different between the Windows and QNX versions. This is because _MSC_VER is defined in the windows, and not in QNX (obviously)... so I defined HAVE_LONG_LONG and SIZEOF_LONG_LONG 8, and things started working. Now my CMN.c lines are the same:
INFO: cmn.c(175): CMN: 46.33 -3.26 1.95 -1.12 -0.52 -0.21 0.57 0.55 0.50 -0.17 -0.14 -0.04 -0.07
INFO: cmn.c(175): CMN: 46.33 -3.26 1.95 -1.12 -0.52 -0.21 0.57 0.55 0.50 -0.17 -0.14 -0.04 -0.07
This means that there is a bug in the FIXSMUL_ANY default implementation (line 111) below (which would get used by default by any PPC solution):
#define FIXMUL_ANY(a,b,radix) \
(fixed32)(((((uint32)(a))&((1<<(radix))-1)) \
* (((uint32)(b))&((1<<(radix))-1)) >> (radix)) \
+ (((((int32)(a))>>(radix)) * (((int32)(b))>>(radix))) << (radix)) \
+ ((((uint32)(a))&((1<<(radix))-1)) * (((int32)(b))>>(radix))) \
+ ((((uint32)(b))&((1<<(radix))-1)) * (((int32)(a))>>(radix))))
endif
However, I still did not get the same results after that. This is what my output looks like:
Windows (good)
INFO: cmn.c(175): CMN: 46.33 -3.26 1.95 -1.12 -0.52 -0.21 0.57 0.55 0.50 -0.
17 -0.14 -0.04 -0.07
INFO: ngram_search.c(375): Resized backpointer table to 10000 entries
INFO: ngram_search_fwdtree.c(1473): 9455 words recognized (5/fr)
INFO: ngram_search_fwdtree.c(1475): 442686 senones evaluated (230/fr)
INFO: ngram_search_fwdtree.c(1477): 132097 channels searched (68/fr), 32630 1s
t, 69440 last
INFO: ngram_search_fwdtree.c(1481): 13873 words for which last channels evalu
ated (7/fr)
INFO: ngram_search_fwdtree.c(1484): 9026 candidate words for entering last p
hone (4/fr)
INFO: ngram_search_fwdflat.c(834): 7470 words recognized (4/fr)
INFO: ngram_search_fwdflat.c(836): 253249 senones evaluated (132/fr)
INFO: ngram_search_fwdflat.c(838): 100121 channels searched (52/fr)
INFO: ngram_search_fwdflat.c(840): 13193 words searched (6/fr)
INFO: ngram_search_fwdflat.c(842): 4581 word transitions (2/fr)
WARNING: "ngram_search.c", line 1022: </s> not found in last frame, using SLEEP
instead
INFO: ngram_search.c(1067): lattice start node <s>.0 end node SLEEP.1880
INFO: ps_lattice.c(1226): Normalizer P(O) = alpha(SLEEP:1880:1920) = -14170805
INFO: ps_lattice.c(1264): Joint P(O,S) = -14180291 P(S|O) = -9486
QNX (bad):
INFO: cmn.c(175): CMN: 46.33 -3.26 1.95 -1.12 -0.52 -0.21 0.57 0.55 0.50 -0.17 -0.14 -0.04 -0.07
INFO: ngram_search_fwdtree.c(1473): 2374 words recognized (1/fr)
INFO: ngram_search_fwdtree.c(1475): 106602 senones evaluated (55/fr)
INFO: ngram_search_fwdtree.c(1477): 23534 channels searched (12/fr), 5173 1st, 11555 last
INFO: ngram_search_fwdtree.c(1481): 2785 words for which last channels evaluated (1/fr)
INFO: ngram_search_fwdtree.c(1484): 1877 candidate words for entering last phone (0/fr)
INFO: ngram_search_fwdflat.c(834): 2692 words recognized (1/fr)
INFO: ngram_search_fwdflat.c(836): 41272 senones evaluated (21/fr)
INFO: ngram_search_fwdflat.c(838): 16315 channels searched (8/fr)
INFO: ngram_search_fwdflat.c(840): 3591 words searched (1/fr)
INFO: ngram_search_fwdflat.c(842): 405 word transitions (0/fr)
WARNING: "ngram_search.c", line 1022: </s> not found in last frame, using TO instead
INFO: ngram_search.c(1067): lattice start node <s>.0 end node TO.3
INFO: ps_lattice.c(1226): Normalizer P(O) = alpha(TO:3:1920) = -19375181
INFO: ps_lattice.c(1264): Joint P(O,S) = -19375181 P(S|O) = 0
So, ideas for the next place to look?
M
> Ok, so it turns out that I traced the error down to the hamming window calculations. It turns out the FIXMUL_ANY function was different between the Windows and QNX versions.
Great. Any ideas how to fix this properly? I suppose you don't use configure checks, then we probably need to create another set of project files for QNX or add some other compile-time check.
> So, ideas for the next place to look?
First of all please make sure you are using -dither no to avoid random noise in testing. Next, let's check the gau calculation if it has the same scores or not. Can you please add -backtrace yes option to output the acoustic scores.
Also, let's now compare the values in pocketsphinx/src/libpocketsphinx/s2_semi_mgau.c. Can you please print the sequence of mfcc values and the sequence of scores assigned to them. They should match as well. Also please check eval_topn. The output will be huge, but you need only a several first rows.
So here is the deal. I had success today in running pocketsphinx on my PPC 440, Virtex5 board.
Way earlier you mentioned "compiler flags"... to be specific, there are two flags which must be defined to get this beast working. The first is FIXED_POINT. The second is WORDS_BIGENDIAN.
In addition to this, there is a bug in tmat.c, around line 236, which looks like this:
if ((bio_fread(&(t->n_tmat), sizeof(int32), 1, fp, byteswap, &chksum) != 1)
t->n_tmat is an int16. This works fine on little endian, but not big endian. The hack to fix it (NOT A GOOD WAY) is this, inserted after the chained if/reads:
because the number in t->n_tmat is actually stored in t->n_state from the read above and 0 is always in t->n_state. I tried just reading two int16s instead, but got some checksum failure... you guys should really check this out.
Finally, another caveat is that you MUST put the following line in your argfile, assuming you are running batch mode:
-input_endian little
The program will change the default for input_endian based on what kind of machine you are (seriously), it is little on my x86, and big on the PPC.
Thanks for your help, I can't believe it was mostly because I was missing one #define. Ugh.
M
Great! So, I assume you are not using configure and make to build it, because they will detect big-endian machines automatically.
In fact you may wish to make your own copy of config.h and sphinx_config.h, much like we do for Win32 and WinCE.
No, not using config and make, I'm cross-compiling using QNX's IDE, which is Eclipse-based.
I foresee training questions forthcoming... :)
M
Oh yea, and
define HAVE_LONG_LONG
define SIZEOF_LONG_LONG 8
need to go in config.h for sphinxbase (defined in your build files).
M
There were similar problems in the past, mostly they were caused by some compilation flag issues.
Start to split the decoding on parts and cross check everything with working version. Print feature values, values after cmn, values read from the model, scores (-backtrace yes) and compare them
Try to build fixed point version on host with exactly the same flags to compare how it works.
Can you give me a little bit of help on how to do some of the things you mention above? I tried the -mfclogdir, which totally blew up my decoder that is working, so I assume this doesn't work... how about some of the other things?
The weird thing is that I compiled the pocketsphinx (& sphinxbase) code on win32 w/ FIXED_POINT defined, and it does work, perfectly. If I run the floating point version, it seems to give the same answers on win32 as it does on QNX, but obviously the fixed point does not. I looked through the compilation flags and didn't see anything that really jumped out at me, I included them for a file below, for grins. I realize I'm not doing the EXPORTS and DLL defines, because I just tossed the sphinxbase code in with pocketsphinx and compiled it all together on my QNX build... this would be undone later if I get it to work.
QNX Flags
C:/QNX632/host/win32/x86/usr/bin/qcc -Vgcc_ntox86 -c -Wc,-Wall -Wc,-Wno-parentheses -DHAVE_CONFIG_H -DFIXED_POINT -O0 -I. -IC:/QNX632/ide4-workspace/PocketSphinx/x86/o -IC:/QNX632/ide4-workspace/PocketSphinx/x86/o-g -IC:/QNX632/ide4-workspace/PocketSphinx/x86 -IC:/QNX632/ide4-workspace/PocketSphinx -IC:/QNX632/ide4-workspace/PocketSphinx/src -IC:/QNX632/ide4-workspace/PocketSphinx/src/Sphinxbase/fe -IC:/QNX632/ide4-workspace/PocketSphinx/src/Sphinxbase/feat -IC:/QNX632/ide4-workspace/PocketSphinx/src/Sphinxbase/lm -IC:/QNX632/ide4-workspace/PocketSphinx/src/Sphinxbase/util -IC:/QNX632/ide4-workspace/PocketSphinx/src -IC:/QNX632/ide4-workspace/PocketSphinx/src/Sphinxbase/include -IC:/QNX632/target/qnx6/usr/include -g -DVARIANT_g C:/QNX632/ide4-workspace/PocketSphinx/src/Sphinxbase/fe/yin.c
MSVS flags
/Od /I "../../../include" /I "../../../sphinxbase/include" /I "../../include" /I "../../../../sphinxbase/include" /I "../../../../sphinxbase/include/win32" /I "../../../src/libpocketsphinx" /D "_DEBUG" /D "WIN32" /D "_WINDOWS" /D "_USRDLL" /D "POCKETSPHINX_EXPORTS" /D "SPHINXDLL" /D "HAVE_CONFIG_H" /D "_CRT_SECURE_NO_DEPRECATE" /D "FIXED_POINT" /D "_VC80_UPGRADE=0x0600" /D "_WINDLL" /D "_MBCS" /Gm /EHsc /RTC1 /MDd /Fp".\Debug/pocketsphinx.pch" /Fo".\Debug/" /Fd".\Debug/" /W3 /nologo /c /ZI /TP /errorReport:prompt
Also, do you have to link a different math library or something? I would image you need to, or not link it at all...
If you do not link libm, you get undefined values for pow, sqrt, floor, _Log, etc. So I'm a bit confused here... how is this a fixed point implementation if it is using the math library which assumes floating point, right?
I think you need to link to a fixedpoint math library. Such do exists, for
example gcc has -msoft-float. But I'm not sure.
Looks strange indeed.
So, for example, I found that these two functions execute (put in debugger breakpoints) -
static float32
fe_mel(melfb_t *mel, float32 x)
{
float32 warped = fe_warp_unwarped_to_warped(mel, x);
}
static float32
fe_melinv(melfb_t *mel, float32 x)
{
float32 warped = (float32) (700.0 * (pow(10.0, x / 2595.0) - 1.0));
return fe_warp_warped_to_unwarped(mel, warped);
}
You can see that they use log10 and pow, from the math library... did I miss something to not use these?
M
This may be useful - the output from cmn.c is different. The win32 (working) fixed point one reads:
INFO: cmn.c(175): CMN: 46.33 -3.26 1.95 -1.12 -0.52 -0.21 0.57 0.55 0.50 -0.17 -0.14 -0.04 -0.07
The QNX (bad) one reads:
INFO: cmn.c(175): CMN: 0.00 16.96 16.96 16.96 16.96 16.96 16.96 16.96 16.96 16.96 16.96 16.96 16.96
Maybe that will tell you something...
Thanks!
M
That's definitely bad and shows that frontend is not functional, but it doesn't give suggestion on the reason.
I did a -mfclogdir dump and the mfc files were definitly not the same. I tried a -rawlogdir dump and those outputs were the same. Does that tell you anything? If not, is there some array, or variable I can capture that might tell us more?
M
So, check
int32
fe_write_frame(fe_t * fe, mfcc_t * fea)
{
fe_spec_magnitude(fe);
fe_mel_spec(fe);
fe_mel_cep(fe, fea);
fe_lifter(fe, fea);
}
in sphinxbase/src/libsphinxbase/fe/fe_sigproc.c
print the array after each call and compare the values.