From: Dogan C. <dog...@us...> - 2014-12-04 01:10:56
|
Hi Dan, It turns out this was due to a bug in our RandGauss2 function that accepts double arguments. I’m committing the following fix to kaldi trunk right away. Cheers, Dogan diff --git a/src/base/kaldi-math.cc b/src/base/kaldi-math.cc index 5b30320..0de73bf 100644 --- a/src/base/kaldi-math.cc +++ b/src/base/kaldi-math.cc @@ -147,7 +147,7 @@ void RandGauss2(double *a, double *b, RandomState *state) float a_float, b_float; // Just because we're using doubles doesn't mean we need super-high-quality // random numbers, so we just use the floating-point version internally. - RandGauss2(&a_float, &b_float); + RandGauss2(&a_float, &b_float, state); *a = a_float; *b = b_float; } > On Dec 3, 2014, at 2:23 AM, Dogan Can <dog...@us...> wrote: > > Hi Dan > > I don’t have a Mavericks setup but I can replicate the failure on Yosemite with clang. When everything is compiled with gcc, matrix-lib-test finishes successfully. > > I looked a bit into the root cause of why the test was going into an infinite loop. The random 3x3 matrices generated during the failing test end up having the same first and third columns and fail the condition number test, hence the generation loop never terminates. I did some sleuthing to figure out how that was happening and it turns out the value of rstate.seed (see the snippet below from matrix/kaldi-matrix.cc <http://kaldi-matrix.cc/>) is the same before and after the call to RandGauss2 (note that inner for loop executes only once). It seems like we are hitting a race condition in the system provided rand_r implementation or a compiler bug. > > > 1093 template<typename Real> > 1094 void MatrixBase<Real>::SetRandn() { > 1095 kaldi::RandomState rstate; > 1096 for (MatrixIndexT row = 0; row < num_rows_; row++) { > 1097 Real *row_data = this->RowData(row); > 1098 MatrixIndexT nc = (num_cols_ % 2 == 1) ? num_cols_ - 1 : num_cols_; > 1099 for (MatrixIndexT col = 0; col < nc; col += 2) { > 1100 kaldi::RandGauss2(row_data + col, row_data + col + 1, &rstate); > 1101 } > 1102 if (nc != num_cols_) row_data[nc] = static_cast<Real>(kaldi::RandGauss(&rstate)); > 1103 } > 1104 } > > > Cheers, > Dogan > >> On Dec 2, 2014, at 10:13 PM, Daniel Povey <dp...@gm... <mailto:dp...@gm...>> wrote: >> >> According to >> http://stackoverflow.com/questions/19554439/gdb-missing-in-os-x-mavericks <http://stackoverflow.com/questions/19554439/gdb-missing-in-os-x-mavericks> >> gdb is no longer supported in mavericks and you have to use lldb. I've >> never used that so I don't know what to tell you to do. >> Dogan, do you have a mavericks setup, and can you see if you get the >> infinite loop that he gets when you test? >> Joaqin: I would just continue through the tutorial for now. Probably >> it's not going to be a problem. >> Dan >> >> >> On Wed, Dec 3, 2014 at 1:09 AM, Daniel Povey <dp...@gm...> wrote: >>> Hm. That makes me think there may be a problem with your gcc/gdb >>> installation- maybe you could try uninstalling it and installing it >>> again. Did you get it from MacPorts? >>> Dan >>> >>> >>> On Wed, Dec 3, 2014 at 1:05 AM, Joaquin Antonio Ruales >>> <ja...@co...> wrote: >>>> Now I get: >>>> >>>> >>>> Program received signal SIGINT, Interrupt. >>>> >>>> 0x00007fff8c755e9a in ?? () >>>> >>>> (gdb) b matrix-lib-test.cc:39 >>>> >>>> Cannot access memory at address 0xab670 >>>> >>>> >>>> On Wed, Dec 3, 2014 at 12:57 AM, Daniel Povey <dp...@gm...> wrote: >>>>> >>>>> Probably when you did ctrl-c it was in the Atlas library, sometimes it >>>>> can't get a proper backtrace. Instead, after doing ctrl-c do: >>>>> (gdb) b matrix-lib-test.cc:39 >>>>> (gdb) c >>>>> >>>>> to continue, and when it breaks, show me a backtrace. >>>>> Dan >>>>> >>>>> >>>>> On Wed, Dec 3, 2014 at 12:48 AM, Joaquin Antonio Ruales >>>>> <ja...@co...> wrote: >>>>>> This is what I get: >>>>>> >>>>>> >>>>>> Program received signal SIGINT, Interrupt. >>>>>> >>>>>> 0x00007fff8b2c6342 in ?? () >>>>>> >>>>>> (gdb) bt >>>>>> >>>>>> #0 0x00007fff8b2c6342 in ?? () >>>>>> >>>>>> #1 0x00007fff5fbfdbc8 in ?? () >>>>>> >>>>>> #2 0x0000000000000001 in ?? () >>>>>> >>>>>> #3 0x00007fff5fbfdfdc in ?? () >>>>>> >>>>>> #4 0x00007fff5fbfdfe0 in ?? () >>>>>> >>>>>> #5 0x00007fff5fbfdbbe in ?? () >>>>>> >>>>>> #6 0x0000000100102672 in ?? () >>>>>> >>>>>> #7 0x00007fff5fbfdc10 in ?? () >>>>>> >>>>>> #8 0x00007fff8b00eac4 in ?? () >>>>>> >>>>>> #9 0x00007fff5fbfda90 in ?? () >>>>>> >>>>>> #10 0x00007fff8e2bf272 in ?? () >>>>>> >>>>>> #11 0x00000002001a0a00 in ?? () >>>>>> >>>>>> #12 0x00000001001a0a00 in ?? () >>>>>> >>>>>> #13 0x000000010019d000 in ?? () >>>>>> >>>>>> #14 0x00000001001a0a00 in ?? () >>>>>> >>>>>> #15 0x00000001003002ca in ?? () >>>>>> >>>>>> #16 0x0000000000000000 in ?? () >>>>>> >>>>>> >>>>>> On Wed, Dec 3, 2014 at 12:42 AM, Daniel Povey <dp...@gm...> wrote: >>>>>>> >>>>>>> That isn't normal and it doesn't happen when I compile on a Mac. >>>>>>> Can you run it in gdb by doing >>>>>>> gdb ./matrix-lib-test >>>>>>> (gdb) r >>>>>>> >>>>>>> and then do ctrl-c when it gets in the loop and type "bt" and show me >>>>>>> the backtrace? >>>>>>> Dan >>>>>>> >>>>>>> >>>>>>> On Wed, Dec 3, 2014 at 12:36 AM, Joaquin Antonio Ruales >>>>>>> <ja...@co...> wrote: >>>>>>>> Hi Dan, >>>>>>>> >>>>>>>> Thanks for the quick reply. I have "svn up"ed and recompiled >>>>>>>> everything, >>>>>>>> but >>>>>>>> now the same matrix test runs into an infinite loop. Would it be safe >>>>>>>> to >>>>>>>> ignore the test results and continue with the tutorial? Here are the >>>>>>>> few >>>>>>>> lines of output from the test: >>>>>>>> >>>>>>>> LOG (RandPosdefSpMatrix():matrix-lib-test.cc:39) Condition number of >>>>>>>> random >>>>>>>> matrix large 7.50727e+16, trying again (this is normal) >>>>>>>> >>>>>>>> LOG (RandPosdefSpMatrix():matrix-lib-test.cc:39) Condition number of >>>>>>>> random >>>>>>>> matrix large 1.91265e+16, trying again (this is normal) >>>>>>>> >>>>>>>> LOG (RandPosdefSpMatrix():matrix-lib-test.cc:39) Condition number of >>>>>>>> random >>>>>>>> matrix large 7.70738e+16, trying again (this is normal) >>>>>>>> >>>>>>>> LOG (RandPosdefSpMatrix():matrix-lib-test.cc:39) Condition number of >>>>>>>> random >>>>>>>> matrix large 2.32488e+17, trying again (this is normal) >>>>>>>> >>>>>>>> LOG (RandPosdefSpMatrix():matrix-lib-test.cc:39) Condition number of >>>>>>>> random >>>>>>>> matrix large 8.44565e+17, trying again (this is normal) >>>>>>>> >>>>>>>> LOG (RandPosdefSpMatrix():matrix-lib-test.cc:39) Condition number of >>>>>>>> random >>>>>>>> matrix large 1.83557e+17, trying again (this is normal) >>>>>>>> >>>>>>>> LOG (RandPosdefSpMatrix():matrix-lib-test.cc:39) Condition number of >>>>>>>> random >>>>>>>> matrix large 2.61736e+17, trying again (this is normal) >>>>>>>> >>>>>>>> LOG (RandPosdefSpMatrix():matrix-lib-test.cc:39) Condition number of >>>>>>>> random >>>>>>>> matrix large 8.67803e+16, trying again (this is normal) >>>>>>>> >>>>>>>> LOG (RandPosdefSpMatrix():matrix-lib-test.cc:39) Condition number of >>>>>>>> random >>>>>>>> matrix large 1.80053e+16, trying again (this is normal) >>>>>>>> >>>>>>>> LOG (RandPosdefSpMatrix():matrix-lib-test.cc:39) Condition number of >>>>>>>> random >>>>>>>> matrix large 1.67101e+16, trying again (this is normal) >>>>>>>> >>>>>>>> LOG (RandPosdefSpMatrix():matrix-lib-test.cc:39) Condition number of >>>>>>>> random >>>>>>>> matrix large 1.49485e+16, trying again (this is normal) >>>>>>>> >>>>>>>> LOG (RandPosdefSpMatrix():matrix-lib-test.cc:39) Condition number of >>>>>>>> random >>>>>>>> matrix large 8.31813e+16, trying again (this is normal) >>>>>>>> >>>>>>>> LOG (RandPosdefSpMatrix():matrix-lib-test.cc:39) Condition number of >>>>>>>> random >>>>>>>> matrix large 1.6285e+16, trying again (this is normal) >>>>>>>> >>>>>>>> LOG (RandPosdefSpMatrix():matrix-lib-test.cc:39) Condition number of >>>>>>>> random >>>>>>>> matrix large 2.48172e+16, trying again (this is normal) >>>>>>>> >>>>>>>> LOG (RandPosdefSpMatrix():matrix-lib-test.cc:39) Condition number of >>>>>>>> random >>>>>>>> matrix large 2.1569e+16, trying again (this is normal) >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Dec 2, 2014 at 4:35 PM, Daniel Povey <dp...@gm...> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> I think I had noticed this before, and I fixed the test by changing >>>>>>>>> the threshold. >>>>>>>>> If you do "svn up" and recompile, it should pass. >>>>>>>>> Dan >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Dec 2, 2014 at 4:20 PM, Joaquín Ruales >>>>>>>>> <ja...@co...> >>>>>>>>> wrote: >>>>>>>>>> Hi Kaldi Team, >>>>>>>>>> >>>>>>>>>> I'm running into trouble when running the Kaldi tests (make test) >>>>>>>>>> described >>>>>>>>>> in the Kaldi tutorial. I'm running it on a Mac (Mavericks) and the >>>>>>>>>> error >>>>>>>>>> is >>>>>>>>>> in matrix-lib-test: KALDI_ASSERT: at >>>>>>>>>> UnitTestLinearCgd:matrix-lib-test.cc:3118, failed: error < 1.0e-05 >>>>>>>>>> * >>>>>>>>>> b.Norm(2.0) >>>>>>>>>> >>>>>>>>>> Has anyone faced this problem before or have any suggestions? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Joaquín >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>>>>>>>>> from Actuate! Instantly Supercharge Your Business Reports and >>>>>>>>>> Dashboards >>>>>>>>>> with Interactivity, Sharing, Native Excel Exports, App Integration >>>>>>>>>> & >>>>>>>>>> more >>>>>>>>>> Get technology previously reserved for billion-dollar >>>>>>>>>> corporations, >>>>>>>>>> FREE >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>>>>>>>>> _______________________________________________ >>>>>>>>>> Kaldi-developers mailing list >>>>>>>>>> Kal...@li... >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>>> > |