Menu

pocketsphinx fixed point vs. floating point

2009-06-09
2012-09-22
  • Mike Medved

    Mike Medved - 2009-06-09

    Hi -

    So I've got two versions of pocketsphinx (and sphinxbase) compiled - one using fixed point, and one floating. The floating point version works great, and gives me basically 100% accuracy. The fixed point, however, doesn't. It doesn't recognize anything, at all, so I'm wondering what I'm doing wrong here. I tried playing with some of the input parameters (http://www.speech.cs.cmu.edu/cmusphinx/moinmoin/PocketsphinxHandhelds), with little luck.

    I'm using wsj1 (the one that comes w/ the pocketsphinx SVN checkout), a custom (very small, like less than 20 words) LM and dictionary. I'm recording audio in Audacity, and I've tried 16bit 8khz and 16khz. The floating version can do both 8/16 khz and fixed can do neither.

    Thoughts/Help?

    M

     
    • Mike Medved

      Mike Medved - 2009-06-12

      I'm assuming you mean print the values of the variable fea. They don't appear to change until after the third call (which is obvious, I guess). Here is what they look like:

      Windows(good)
      0x00BCFBA8 ff f9 03 00
      0x00BCFBAC 72 7a ff ff
      0x00BCFBB0 6a e1 ff ff
      0x00BCFBB4 19 0b 00 00
      0x00BCFBB8 67 ff ff ff
      0x00BCFBBC e0 02 00 00
      0x00BCFBC0 74 10 00 00
      0x00BCFBC4 20 0d 00 00
      0x00BCFBC8 1a fd ff ff
      0x00BCFBCC 4d ec ff ff
      0x00BCFBD0 e9 f1 ff ff
      0x00BCFBD4 1f 03 00 00
      0x00BCFBD8 14 fd ff ff

      QNX (bad)
      fea : 0x83012A0 <Hex>
      Address 0 1 2 3
      083012A0 01 00 00 00
      083012A4 9F EC C2 EB
      083012A8 9F EC C2 EB
      083012AC 9F EC C2 EB
      083012B0 A1 EC C2 EB
      083012B4 9F EC C2 EB
      083012B8 9F EC C2 EB
      083012BC 9F EC C2 EB
      083012C0 A1 EC C2 EB
      083012C4 9F EC C2 EB
      083012C8 A0 EC C2 EB
      083012CC 9F EC C2 EB
      083012D0 9F EC C2 EB

      So, I guess obviously the second one is not good, just not sure how this helps me get where I need to be...

      M

       
      • Nickolay V. Shmyrev

        So, we need to improve the level of granularity and compare values inside the function, do you get the idea :) ? There is nothing suspicious as far as I can see now, but since values are different bug must be somewhere inside.

         
    • Mike Medved

      Mike Medved - 2009-06-17

      Ok, so it turns out that I traced the error down to the hamming window calculations. It turns out the FIXMUL_ANY function was different between the Windows and QNX versions. This is because _MSC_VER is defined in the windows, and not in QNX (obviously)... so I defined HAVE_LONG_LONG and SIZEOF_LONG_LONG 8, and things started working. Now my CMN.c lines are the same:

      INFO: cmn.c(175): CMN: 46.33 -3.26 1.95 -1.12 -0.52 -0.21 0.57 0.55 0.50 -0.17 -0.14 -0.04 -0.07
      INFO: cmn.c(175): CMN: 46.33 -3.26 1.95 -1.12 -0.52 -0.21 0.57 0.55 0.50 -0.17 -0.14 -0.04 -0.07

      This means that there is a bug in the FIXSMUL_ANY default implementation (line 111) below (which would get used by default by any PPC solution):

      #define FIXMUL_ANY(a,b,radix) \
      (fixed32)(((((uint32)(a))&((1<<(radix))-1)) \
      * (((uint32)(b))&((1<<(radix))-1)) >> (radix)) \
      + (((((int32)(a))>>(radix)) * (((int32)(b))>>(radix))) << (radix)) \
      + ((((uint32)(a))&((1<<(radix))-1)) * (((int32)(b))>>(radix))) \
      + ((((uint32)(b))&((1<<(radix))-1)) * (((int32)(a))>>(radix))))

      endif

      However, I still did not get the same results after that. This is what my output looks like:

      Windows (good)
      INFO: cmn.c(175): CMN: 46.33 -3.26 1.95 -1.12 -0.52 -0.21 0.57 0.55 0.50 -0.
      17 -0.14 -0.04 -0.07
      INFO: ngram_search.c(375): Resized backpointer table to 10000 entries
      INFO: ngram_search_fwdtree.c(1473): 9455 words recognized (5/fr)
      INFO: ngram_search_fwdtree.c(1475): 442686 senones evaluated (230/fr)
      INFO: ngram_search_fwdtree.c(1477): 132097 channels searched (68/fr), 32630 1s
      t, 69440 last
      INFO: ngram_search_fwdtree.c(1481): 13873 words for which last channels evalu
      ated (7/fr)
      INFO: ngram_search_fwdtree.c(1484): 9026 candidate words for entering last p
      hone (4/fr)
      INFO: ngram_search_fwdflat.c(834): 7470 words recognized (4/fr)
      INFO: ngram_search_fwdflat.c(836): 253249 senones evaluated (132/fr)
      INFO: ngram_search_fwdflat.c(838): 100121 channels searched (52/fr)
      INFO: ngram_search_fwdflat.c(840): 13193 words searched (6/fr)
      INFO: ngram_search_fwdflat.c(842): 4581 word transitions (2/fr)
      WARNING: "ngram_search.c", line 1022: </s> not found in last frame, using SLEEP
      instead
      INFO: ngram_search.c(1067): lattice start node <s>.0 end node SLEEP.1880
      INFO: ps_lattice.c(1226): Normalizer P(O) = alpha(SLEEP:1880:1920) = -14170805
      INFO: ps_lattice.c(1264): Joint P(O,S) = -14180291 P(S|O) = -9486

      QNX (bad):
      INFO: cmn.c(175): CMN: 46.33 -3.26 1.95 -1.12 -0.52 -0.21 0.57 0.55 0.50 -0.17 -0.14 -0.04 -0.07
      INFO: ngram_search_fwdtree.c(1473): 2374 words recognized (1/fr)
      INFO: ngram_search_fwdtree.c(1475): 106602 senones evaluated (55/fr)
      INFO: ngram_search_fwdtree.c(1477): 23534 channels searched (12/fr), 5173 1st, 11555 last
      INFO: ngram_search_fwdtree.c(1481): 2785 words for which last channels evaluated (1/fr)
      INFO: ngram_search_fwdtree.c(1484): 1877 candidate words for entering last phone (0/fr)
      INFO: ngram_search_fwdflat.c(834): 2692 words recognized (1/fr)
      INFO: ngram_search_fwdflat.c(836): 41272 senones evaluated (21/fr)
      INFO: ngram_search_fwdflat.c(838): 16315 channels searched (8/fr)
      INFO: ngram_search_fwdflat.c(840): 3591 words searched (1/fr)
      INFO: ngram_search_fwdflat.c(842): 405 word transitions (0/fr)
      WARNING: "ngram_search.c", line 1022: </s> not found in last frame, using TO instead
      INFO: ngram_search.c(1067): lattice start node <s>.0 end node TO.3
      INFO: ps_lattice.c(1226): Normalizer P(O) = alpha(TO:3:1920) = -19375181
      INFO: ps_lattice.c(1264): Joint P(O,S) = -19375181 P(S|O) = 0

      So, ideas for the next place to look?

      M

       
      • Nickolay V. Shmyrev

        > Ok, so it turns out that I traced the error down to the hamming window calculations. It turns out the FIXMUL_ANY function was different between the Windows and QNX versions.

        Great. Any ideas how to fix this properly? I suppose you don't use configure checks, then we probably need to create another set of project files for QNX or add some other compile-time check.

        > So, ideas for the next place to look?

        First of all please make sure you are using -dither no to avoid random noise in testing. Next, let's check the gau calculation if it has the same scores or not. Can you please add -backtrace yes option to output the acoustic scores.

        Also, let's now compare the values in pocketsphinx/src/libpocketsphinx/s2_semi_mgau.c. Can you please print the sequence of mfcc values and the sequence of scores assigned to them. They should match as well. Also please check eval_topn. The output will be huge, but you need only a several first rows.

         
    • Mike Medved

      Mike Medved - 2009-06-19

      So here is the deal. I had success today in running pocketsphinx on my PPC 440, Virtex5 board.

      Way earlier you mentioned "compiler flags"... to be specific, there are two flags which must be defined to get this beast working. The first is FIXED_POINT. The second is WORDS_BIGENDIAN.

      In addition to this, there is a bug in tmat.c, around line 236, which looks like this:

      if ((bio_fread(&(t->n_tmat), sizeof(int32), 1, fp, byteswap, &chksum) != 1)

      t->n_tmat is an int16. This works fine on little endian, but not big endian. The hack to fix it (NOT A GOOD WAY) is this, inserted after the chained if/reads:

      //Hack!
      if(t-&gt;n_tmat == 0)
      {
         t-&gt;n_tmat = t-&gt;n_state;
      }
      

      because the number in t->n_tmat is actually stored in t->n_state from the read above and 0 is always in t->n_state. I tried just reading two int16s instead, but got some checksum failure... you guys should really check this out.

      Finally, another caveat is that you MUST put the following line in your argfile, assuming you are running batch mode:

      -input_endian little

      The program will change the default for input_endian based on what kind of machine you are (seriously), it is little on my x86, and big on the PPC.

      Thanks for your help, I can't believe it was mostly because I was missing one #define. Ugh.

      M

       
      • David Huggins-Daines

        Great! So, I assume you are not using configure and make to build it, because they will detect big-endian machines automatically.

        In fact you may wish to make your own copy of config.h and sphinx_config.h, much like we do for Win32 and WinCE.

         
        • Mike Medved

          Mike Medved - 2009-06-22

          No, not using config and make, I'm cross-compiling using QNX's IDE, which is Eclipse-based.

          I foresee training questions forthcoming... :)

          M

           
    • Mike Medved

      Mike Medved - 2009-06-19

      Oh yea, and

      define HAVE_LONG_LONG

      define SIZEOF_LONG_LONG 8

      need to go in config.h for sphinxbase (defined in your build files).

      M

       
    • Nickolay V. Shmyrev

      There were similar problems in the past, mostly they were caused by some compilation flag issues.

      Start to split the decoding on parts and cross check everything with working version. Print feature values, values after cmn, values read from the model, scores (-backtrace yes) and compare them

      Try to build fixed point version on host with exactly the same flags to compare how it works.

       
      • Mike Medved

        Mike Medved - 2009-06-10

        Can you give me a little bit of help on how to do some of the things you mention above? I tried the -mfclogdir, which totally blew up my decoder that is working, so I assume this doesn't work... how about some of the other things?

        The weird thing is that I compiled the pocketsphinx (& sphinxbase) code on win32 w/ FIXED_POINT defined, and it does work, perfectly. If I run the floating point version, it seems to give the same answers on win32 as it does on QNX, but obviously the fixed point does not. I looked through the compilation flags and didn't see anything that really jumped out at me, I included them for a file below, for grins. I realize I'm not doing the EXPORTS and DLL defines, because I just tossed the sphinxbase code in with pocketsphinx and compiled it all together on my QNX build... this would be undone later if I get it to work.

        QNX Flags

        C:/QNX632/host/win32/x86/usr/bin/qcc -Vgcc_ntox86 -c -Wc,-Wall -Wc,-Wno-parentheses -DHAVE_CONFIG_H -DFIXED_POINT -O0 -I. -IC:/QNX632/ide4-workspace/PocketSphinx/x86/o -IC:/QNX632/ide4-workspace/PocketSphinx/x86/o-g -IC:/QNX632/ide4-workspace/PocketSphinx/x86 -IC:/QNX632/ide4-workspace/PocketSphinx -IC:/QNX632/ide4-workspace/PocketSphinx/src -IC:/QNX632/ide4-workspace/PocketSphinx/src/Sphinxbase/fe -IC:/QNX632/ide4-workspace/PocketSphinx/src/Sphinxbase/feat -IC:/QNX632/ide4-workspace/PocketSphinx/src/Sphinxbase/lm -IC:/QNX632/ide4-workspace/PocketSphinx/src/Sphinxbase/util -IC:/QNX632/ide4-workspace/PocketSphinx/src -IC:/QNX632/ide4-workspace/PocketSphinx/src/Sphinxbase/include -IC:/QNX632/target/qnx6/usr/include -g -DVARIANT_g C:/QNX632/ide4-workspace/PocketSphinx/src/Sphinxbase/fe/yin.c

        MSVS flags
        /Od /I "../../../include" /I "../../../sphinxbase/include" /I "../../include" /I "../../../../sphinxbase/include" /I "../../../../sphinxbase/include/win32" /I "../../../src/libpocketsphinx" /D "_DEBUG" /D "WIN32" /D "_WINDOWS" /D "_USRDLL" /D "POCKETSPHINX_EXPORTS" /D "SPHINXDLL" /D "HAVE_CONFIG_H" /D "_CRT_SECURE_NO_DEPRECATE" /D "FIXED_POINT" /D "_VC80_UPGRADE=0x0600" /D "_WINDLL" /D "_MBCS" /Gm /EHsc /RTC1 /MDd /Fp".\Debug/pocketsphinx.pch" /Fo".\Debug/" /Fd".\Debug/" /W3 /nologo /c /ZI /TP /errorReport:prompt

         
      • Mike Medved

        Mike Medved - 2009-06-10

        Also, do you have to link a different math library or something? I would image you need to, or not link it at all...

        If you do not link libm, you get undefined values for pow, sqrt, floor, _Log, etc. So I'm a bit confused here... how is this a fixed point implementation if it is using the math library which assumes floating point, right?

         
        • Nickolay V. Shmyrev

          I think you need to link to a fixedpoint math library. Such do exists, for
          example gcc has -msoft-float. But I'm not sure.

          Looks strange indeed.

           
      • Mike Medved

        Mike Medved - 2009-06-10

        So, for example, I found that these two functions execute (put in debugger breakpoints) -

        static float32
        fe_mel(melfb_t *mel, float32 x)
        {
        float32 warped = fe_warp_unwarped_to_warped(mel, x);

        return (float32) (2595.0 * log10(1.0 + warped / 700.0));
        

        }

        static float32
        fe_melinv(melfb_t *mel, float32 x)
        {
        float32 warped = (float32) (700.0 * (pow(10.0, x / 2595.0) - 1.0));
        return fe_warp_warped_to_unwarped(mel, warped);
        }

        You can see that they use log10 and pow, from the math library... did I miss something to not use these?

        M

         
    • Mike Medved

      Mike Medved - 2009-06-10

      This may be useful - the output from cmn.c is different. The win32 (working) fixed point one reads:

      INFO: cmn.c(175): CMN: 46.33 -3.26 1.95 -1.12 -0.52 -0.21 0.57 0.55 0.50 -0.17 -0.14 -0.04 -0.07

      The QNX (bad) one reads:

      INFO: cmn.c(175): CMN: 0.00 16.96 16.96 16.96 16.96 16.96 16.96 16.96 16.96 16.96 16.96 16.96 16.96

      Maybe that will tell you something...

      Thanks!
      M

       
      • Nickolay V. Shmyrev

        That's definitely bad and shows that frontend is not functional, but it doesn't give suggestion on the reason.

         
        • Mike Medved

          Mike Medved - 2009-06-11

          I did a -mfclogdir dump and the mfc files were definitly not the same. I tried a -rawlogdir dump and those outputs were the same. Does that tell you anything? If not, is there some array, or variable I can capture that might tell us more?

          M

           
          • Nickolay V. Shmyrev

            So, check

            int32
            fe_write_frame(fe_t * fe, mfcc_t * fea)
            {
            fe_spec_magnitude(fe);
            fe_mel_spec(fe);
            fe_mel_cep(fe, fea);
            fe_lifter(fe, fea);

            return 1;
            

            }

            in sphinxbase/src/libsphinxbase/fe/fe_sigproc.c

            print the array after each call and compare the values.

             

Log in to post a comment.