From: Zoltan B. <zb...@du...> - 2005-06-12 22:09:12
|
Hi, I created a two-query report with <Part> tag. There is a bug somewhere in RLIB that overwrites the stack and I get a "** NUTS.. WE CRASHED" message. If I filter the report's queries to return no rows, it doesn't crash. But if I filter the queries to return something and execute the report twice with the same result sets, I get a crash. Executing the query the first time I get the correct PDF. I try to track it down. Finally, the developers put x86-64 support into valgrind, maybe it will work for me, too. Best regards, Zolt=E1n B=F6sz=F6rm=E9nyi |
From: Bob D. <bd...@si...> - 2005-06-12 22:13:22
|
Can you send a stack trace or is it really nasty? On Mon, 2005-06-13 at 00:25 +0200, Zoltan Boszormenyi wrote: > Hi, > > I created a two-query report with <Part> tag. > There is a bug somewhere in RLIB that overwrites the stack > and I get a "** NUTS.. WE CRASHED" message. > If I filter the report's queries to return no rows, > it doesn't crash. But if I filter the queries to return > something and execute the report twice with the same > result sets, I get a crash. Executing the query the first > time I get the correct PDF. I try to track it down. > Finally, the developers put x86-64 support into valgrind, > maybe it will work for me, too. > > Best regards, > Zoltán Böszörményi -- Bob Doan <bd...@si...> |
From: Zoltan B. <zb...@du...> - 2005-06-23 20:56:56
Attachments:
rlib-1.3.4-number-conversion-crash-fix.patch
|
Hi! Bob Doan =C3=ADrta: > Can you send a stack trace or is it really nasty? >=20 > On Mon, 2005-06-13 at 00:25 +0200, Zoltan Boszormenyi wrote: >=20 >>Hi, >> >>I created a two-query report with <Part> tag. >>There is a bug somewhere in RLIB that overwrites the stack >>and I get a "** NUTS.. WE CRASHED" message. >>If I filter the report's queries to return no rows, >>it doesn't crash. But if I filter the queries to return >>something and execute the report twice with the same >>result sets, I get a crash. Executing the query the first >>time I get the correct PDF. I try to track it down. >>Finally, the developers put x86-64 support into valgrind, >>maybe it will work for me, too. >> >>Best regards, >>Zolt=C3=A1n B=C3=B6sz=C3=B6rm=C3=A9nyi Finally here's the fix for the crash I was experienced. It was not a 64-bitness issue, it crashed because the number conversion str(field,N,M) interpreted N and M backwards, e.g. str(field,18,2) was actually converted as "%81.2f" and the space used for converting both the integer and decimal parts of the number were statically allocated as: gchar left_holding[20]; gchar right_holding[20]; If one used two digit size for the integer part immediately exceeded the allocated space and caused stack corruption. I fixed rlib_number_sprintf() and rlib_pcode_operator_str() to allocate their strings dinamically and rlib_number_sprintf() to interpret the numbers correctly. It was a tough one to find without valgrind. :-D The patch also removes a bad cast, the macro changed isn't used in RLIB yet, but it would cause a memory corruption, too, on 64-bit systems. (long long) is 128 bits here and number_value member of struct rlib_value is gint64. The compiler can at least warn us this way. Best regards, Zolt=C3=A1n B=C3=B6sz=C3=B6rm=C3=A9nyi |
From: Zoltan B. <zb...@du...> - 2005-06-24 05:07:32
|
Zoltan Boszormenyi =C3=ADrta: > Hi! >=20 > Bob Doan =C3=ADrta: >=20 >> Can you send a stack trace or is it really nasty? >> >> On Mon, 2005-06-13 at 00:25 +0200, Zoltan Boszormenyi wrote: >> >>> Hi, >>> >>> I created a two-query report with <Part> tag. >>> There is a bug somewhere in RLIB that overwrites the stack >>> and I get a "** NUTS.. WE CRASHED" message. >>> If I filter the report's queries to return no rows, >>> it doesn't crash. But if I filter the queries to return >>> something and execute the report twice with the same >>> result sets, I get a crash. Executing the query the first >>> time I get the correct PDF. I try to track it down. >>> Finally, the developers put x86-64 support into valgrind, >>> maybe it will work for me, too. >>> >>> Best regards, >>> Zolt=C3=A1n B=C3=B6sz=C3=B6rm=C3=A9nyi >=20 >=20 > Finally here's the fix for the crash I was experienced. It was not a > 64-bitness issue, it crashed because the number conversion > str(field,N,M) interpreted N and M backwards, e.g. str(field,18,2) > was actually converted as "%81.2f" and the space used for > converting both the integer and decimal parts of the number > were statically allocated as: >=20 > gchar left_holding[20]; > gchar right_holding[20]; >=20 > If one used two digit size for the integer part immediately > exceeded the allocated space and caused stack corruption. > I fixed rlib_number_sprintf() and rlib_pcode_operator_str() > to allocate their strings dinamically and rlib_number_sprintf() > to interpret the numbers correctly. >=20 > It was a tough one to find without valgrind. :-D >=20 > The patch also removes a bad cast, the macro changed isn't used > in RLIB yet, but it would cause a memory corruption, too, on 64-bit > systems. (long long) is 128 bits here and number_value member of > struct rlib_value is gint64. The compiler can at least warn us > this way. Bad news. Running the same report in a for() cycle, it still crashes on the second turn. There are more stack-destroying bugs in heaven and earth, Horatio... :-) Best regards, Zolt=C3=A1n B=C3=B6sz=C3=B6rm=C3=A9nyi |
From: William K. V. <wk...@us...> - 2005-06-24 15:01:57
|
On Thu, 2005-06-23 at 23:25, Zoltan Boszormenyi wrote: > Bad news. Running the same report in a for() cycle, it still > crashes on the second turn. There are more stack-destroying bugs > in heaven and earth, Horatio... :-) I also ran into this last night, haven't had a chance to investigate further. Are you using your two patches for the GCC 4 issues? My RLIB version is skewed a bit from latest CVS, 3 of the last 5 patches I made are not in CVS yet. I might get some time this weekend to help you hunt the other bugs. Cheers, William. |
From: Zoltan B. <zb...@du...> - 2005-06-24 19:50:36
|
William K. Volkman =EDrta: > On Thu, 2005-06-23 at 23:25, Zoltan Boszormenyi wrote: >=20 >>Bad news. Running the same report in a for() cycle, it still >>crashes on the second turn. There are more stack-destroying bugs >>in heaven and earth, Horatio... :-) >=20 >=20 > I also ran into this last night, haven't had a chance to Phew. So it's not just my bad luck. :-) > investigate further. Are you using your two patches > for the GCC 4 issues? My RLIB version is skewed a bit Yes, I am using those two patches, and I also cleaned up "long long" usage replacing it with gint64 abstract type to further reduce the amount of warnings. It would be best to get rid of all the explicit casts so the compiler's type checking will work. > from latest CVS, 3 of the last 5 patches I made > are not in CVS yet. I might get some time this weekend > to help you hunt the other bugs. I intend to install the final gcc-4.0.0 (the gcc4 update for FC3 is based on a snapshot) and the bounds checking patch on top of it. This seems to be a highly useful feature, you can find it here: http://sourceforge.net/projects/boundschecking/ The compiler will generate code that guards against itself if you use option -fbounds-checking. Valgrind for x86-64 (daily subversion repository) starts working again but it does not catch stack corruption. :-( Thanks, anyway. Best regards, Zolt=E1n B=F6sz=F6rm=E9nyi |
From: Zoltan B. <zb...@du...> - 2005-06-26 17:58:02
Attachments:
gcc4a.spec
05-rlib-1.3.4-stack-stomping-fixes.patch
|
Zoltan Boszormenyi =EDrta: > William K. Volkman =EDrta: >=20 >> On Thu, 2005-06-23 at 23:25, Zoltan Boszormenyi wrote: >> >>> Bad news. Running the same report in a for() cycle, it still >>> crashes on the second turn. There are more stack-destroying bugs >>> in heaven and earth, Horatio... :-) >> >> >> >> I also ran into this last night, haven't had a chance to >=20 >=20 > Phew. So it's not just my bad luck. :-) >=20 >> investigate further. Are you using your two patches >> for the GCC 4 issues? My RLIB version is skewed a bit >=20 >=20 > Yes, I am using those two patches, and I also cleaned up > "long long" usage replacing it with gint64 abstract type > to further reduce the amount of warnings. It would be best > to get rid of all the explicit casts so the compiler's > type checking will work. >=20 >> from latest CVS, 3 of the last 5 patches I made >> are not in CVS yet. I might get some time this weekend >> to help you hunt the other bugs. >=20 >=20 > I intend to install the final gcc-4.0.0 (the gcc4 update for FC3 > is based on a snapshot) and the bounds checking patch on top of it. > This seems to be a highly useful feature, you can find it here: >=20 > http://sourceforge.net/projects/boundschecking/ >=20 > The compiler will generate code that guards against itself > if you use option -fbounds-checking. After compiling GCC4 4 times, here's the RPM SPEC file if someone is interested. It's based on the FC3 GCC4 errata release, you will need gcc4-4.0.0-0.41.fc3.src.rpm, the patch from the above address for GCC 4.0, bounds-checking-gcc-4.0.0-1.00.patch.bz2, and this specfile. Using -fbounds-checking didn't reveal any real bugs in RLIB, but here are fixes for two suspicious functions. They got suspicious because of the bounds checking messages. One of the bugs was not obvious, patch is also attached. - gint64 tentothe(gint n) does not check whether the index into the local array is a correct index. Fix it by using pow(N, M). Two advantages: no more bad stack access and it gives correct answer for n=3D0...18, comparing with the previous n=3D0...11. - gchar hextochar(gchar c) assigned an integer (32-bit) value internally to its character parameter, possibly overwriting something on the stack. However, it didn't fix the bug where running the same report the second time makes it crash. Best regards, Zolt=E1n B=F6sz=F6rm=E9nyi |
From: Zoltan B. <zb...@du...> - 2005-06-26 18:25:09
|
Zoltan Boszormenyi =EDrta: > - gchar hextochar(gchar c) assigned an integer (32-bit) value > internally to its character parameter, possibly overwriting > something on the stack. I hate myself when I keep forgetting that implicit type-casts work. :-( So this fix is actually a warning fix, not a bugfix. Best regards, Zolt=E1n B=F6sz=F6rm=E9nyi |