Thread: [Cppcms-users] SIGABRT problem raised between SVN rev1474 and 1491
Brought to you by:
artyom-beilis
From: Julian P. <ju...@wh...> - 2010-10-26 17:05:10
|
Hallo, a SIGABRT problem occured for me when I switched from cppcms rev. 1474 to 1491. This is the trace, the problem seems to be located in libbooster: *** glibc detected *** bin/myserver.fcgi: malloc(): memory corruption: 0x0927c878 *** ======= Backtrace: ========= /lib/libc.so.6[0xf720b935] /lib/libc.so.6[0xf720ded2] /lib/libc.so.6(__libc_malloc+0x96)[0xf720f676] /usr/lib/libstdc++.so.6(_Znwj+0x27)[0xf73cb2d7] /usr/lib/libbooster.so.0(_ZNSt6vectorIPvSaIS0_EE14_M_fill_insertEN9__gnu_cxx17__normal_iteratorIPS0_S2_EEjRKS0_+0x24d)[0xf766627d] /usr/lib/libcppcms.so.1(_ZN6cppcms4json14bad_value_castC1ERKSs+0x72)[0xf74f19f2] bin/myserver.fcgi(_ZN7msrv14NetworkHandler10loadConfigERSt6vectorISsSaISsEE+0x72)[0x8069182] bin/myserver.fcgi(main+0x57d)[0x80565fd] /lib/libc.so.6(__libc_start_main+0xe5)[0xf71b7455] bin/myserver.fcgi[0x8054c91] Reproducibility is always: Switch between libcppcms SVN 1474 and 1491 simply by make install in the corresponding build directory and the problem occurs/disappears. The problem seems to be restricted to 32bit machines: While the current SVN revision causes no problems on 64bit, both 32bit x86 and ARM machines are hit by this problem. Thanks for your feedback, Julian |
From: Artyom <art...@ya...> - 2010-10-26 17:45:21
|
Hello, In what case does it happens can you give me a sample source code that reproduces the problem? Artyom > Hallo, > > a SIGABRT problem occured for me when I switched from cppcms rev. 1474 to > 1491. > This is the trace, the problem seems to be located in libbooster: > > *** glibc detected *** bin/myserver.fcgi: malloc(): memory corruption: > 0x0927c878 *** No, this does not mean that the problem is booster as memory corruption can happen anywhere and libc detects it afterwards, do you see anything unusual in valgrind (if you know how to use it and have it)? Artyom |
From: Julian P. <ju...@wh...> - 2010-10-26 18:23:38
Attachments:
valgrind-output.log
|
> Hello, > > In what case does it happens can you give me a sample source code that > reproduces the problem? > I think it happens if a cppcms::json::bad_cast exception is thrown. I try to read a json formatted configuration file from different possible locations with cppcms::json, and if it is not formatted properly, a bad_cast exception should be thrown which I catch and then try the next possible location. In valgrind, the program runs without crashing (I don't understand this), but it outputs a lot of invalid reads related to a std::string instance of a cppcms::json::bad_cast instance. Although, with and without gdb the program crashes as described. Valgrind output is attached. Thanks, Julian |
From: Artyom <art...@ya...> - 2010-10-26 21:51:10
|
> > Hello, > > > > In what case does it happens can you give me a sample source code that > > reproduces the problem? > > > > I think it happens if a cppcms::json::bad_cast exception is thrown. I try > to read a json formatted configuration file from different possible > locations with cppcms::json, and if it is not formatted properly, a > bad_cast exception should be thrown which I catch and then try the next > possible location. I don't see any issues in the code and location valgrind is posting. Also I tryid to run a full build and test on x86 machine - no issues. 1. Make sure that you fully recompile your own code after building and installing latest CppCMS. 2. If this still happens please provide minimal code sample that reproduces the problem, without it I can't do anything and I may only assume that memory corruption happens in your code. Regards, Artyom |
From: Julian P. <ju...@wh...> - 2010-10-26 22:15:35
|
> > I don't see any issues in the code and location valgrind is posting. > > Also I tryid to run a full build and test on x86 machine - no issues. > > 1. Make sure that you fully recompile your own code after building and > installing latest CppCMS. > > 2. If this still happens please provide minimal code sample that reproduces > the problem, without it I can't do anything and I may only assume that > memory corruption happens in your code. > Hallo, I see your point and will try to provide a sample tomorrow. I nevertheless think that it must have at least something to do with your changes between the two revisions mentioned, because with the same code on my side (of course, recompiled for the revision to be tested) with the earlier revision everything works smooth while it doesn't with the latter revision. Couldn't the invalid write mentioned in valgrind's outputs lead to the SIGABRT? What I don't understand anyway is 1. Why does the SIGABRT occur when running without any debugger attached or with gdb, but not with valgrind? The program runs (besides of the given log output) without any problems, but of course slower than normal. Even if my application doesn't use more than one thread at this time and cppcms::service ist not started at that point, it almost looks like a Race condition, but how could that be when only one thread is running? 2. Why, at all, is it a SIGABRT and not a SIGSEGV? According to the output it is a memory access violation and should be a SIGSEGV then, I think. What I do in the method where the crash occurs is basically the following: - Given is a std::vector of std::strings (filepaths) as reference - I iterate through the vector, and for each of the strings I call a method which tries to open the given filepath and, if successful, loads it via is >> v; to a json::value, where is is a std::ifstream. - Then, another method is invoked to parse this value. A bad_value_cast exception may be thrown if the file has an invalid format - If the first method is not successful in loading any of the given files, it throws a bad_cast exception like the following: throw(cppcms::json::bad_value_cast("None of the given files could be loaded.")); Could it be that you changed the implementation of bad_value_cast so that an instantiation like this is not possible anymore? THanks, Julian |
From: Artyom <art...@ya...> - 2010-10-27 05:12:11
|
> Hallo, > > I see your point and will try to provide a sample tomorrow. I > nevertheless think that it must have at least something to do with your > changes between the two revisions mentioned, because with the same code > on my side (of course, recompiled for the revision to be tested) with > the earlier revision everything works smooth while it doesn't with the > latter revision. > Couldn't the invalid write mentioned in valgrind's outputs lead to the > SIGABRT? > What I don't understand anyway is > 1. Why does the SIGABRT occur when running without any debugger attached > or with gdb, but not with valgrind? AFAIK valgrind replaces some basic functions like malloc for better control of output. > The program runs (besides of the > 2. Why, at all, is it a SIGABRT and not a SIGSEGV? According to the > output it is a memory access violation and should be a SIGSEGV then, I > think. Because it is rather heap structure corruption rather then memory access violation. So libc detects the issue and aborts the program before it is "too late" > > Could it be that you changed the implementation of bad_value_cast so > that an instantiation like this is not possible anymore? Shouldn't be, in any case you may try following: Comment out lines: 13 and 14 in file: booster/lib/backtrace/src/backtrace.cpp Such it would be #if defined(__linux) || defined(__APPLE__) || defined(__sun) //#define BOOSTER_HAVE_EXECINFO //#define BOOSTER_HAVE_DLADDR #endif And tell if program still crashes. Also give me what is your Linux version and libc version + how do you compile program and CppCMS. In any case I need a crashing sample to debug the issue. Artyom |
From: Julian P. <ju...@wh...> - 2010-10-27 18:50:19
Attachments:
cppcms-test.cpp
|
Hallo, I wrote a code sample, which consists only of a main function and a throw and reproduces the problem. It's attached to this mail. >> >> Could it be that you changed the implementation of bad_value_cast so >> that an instantiation like this is not possible anymore? > > Shouldn't be, in any case you may try following: > > Comment out lines: 13 and 14 in file: > booster/lib/backtrace/src/backtrace.cpp Does not fix the problem for me. > Also give me what is your Linux version and libc version + how do you > compile program and CppCMS. > I use the standard Debian Lenny libraries, which means that libc is glibc 2.7. The problem occurs on linux kernels 2.6.21 (ARM) and 2.6.32 (x86), I did not test any other kernel versions. It may be that the problem is not related to 32bit at all, because all of my 64bit capable systems run Ubuntu or Sabayon Linux and therefore use newer versions of glibc. I do not compile with any other C/CXXFLAGS than the defaults used by cmake, the problem occurs with release types RelWithDebInfo and Release, other release types were not tested. GCC is 4.3.2 (on the 32bit platforms using Lenny) and GCCs of the 4.4.x series on the 64bit machines. The sample program has been compiled with the CMake default C/CXXFLAGS for RelWithDebInfo and Release and linked to libcppcms and libbooster. Thanks, Julian |
From: Artyom <art...@ya...> - 2010-10-27 19:43:17
|
> > I wrote a code sample, which consists only of a main function and a throw > and reproduces the problem. It's attached to this mail. int main() { throw(cppcms::json::bad_value_cast("Loading any of the configuration files given failed.")); } If you throw an exception somewhere you should catch it somewhere... If unhandled exception is thrown it calls std::terminate that in his turn calls abort... This is expected behavior You need to write int main() { try { throw cppcms::json::bad_value_cast("Loading any of the configuration files given failed."); } catch(std::exception const &e) { std::cerr << "Catched exception " << e.what() << std::endl; return 1; } return 0; } Actually it is good paratice to make a try...catch block around your all code and catch all exceptions and print what happens. Artyom |
From: Julian P. <ju...@wh...> - 2010-10-27 20:45:35
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I know that, but I also surrounded it with a fitting try ... catch block and it did happen nevertheless. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJMyI9uAAoJENidYKvYQHlQSjwP/2mBlGia15uaTSwlKFEkAqp0 zD3I6hO+CAbtRFnI8dWL+Hy+dwDiFpa3AOcEz+MRpyimXHBBTTSc+S/M+7dzBxh4 clA+giVb/ieKP6ldPVf+vilc7PPSkWv+cKQU/GxVfse4tudm6XmARaA27Z8WP7V6 xBfux/OcLVckTnbYyhXdCc5+OjiJTX7V+yIGPVisAfUF0dKvB5d4ecWKNn2f3w9h EXOQUtgEkRzKeHYQ7KHDMdd6BpXzkti+9B8ObLkgv64NUqGLapvz13dPOMkcvz+k AK1UZnOKF9zdFAlNbv+QutnzWFFQtLe9oLw/EYZbHBQZ3cM2Taa4ziQcG8e71zkL 3nVQYspa6c+/D4o6a/A7sy9az/NNjE1Tkd3tuxTe0qGIm8TWfETRtqAqPSdgcn2n 6ANKjg2AKsvyJ+JFPIJc7jd0DnoSAsa+EjSRE0K/xQ5h7e45jGh4f6zI7BObSuBN 4d5CFLI7LnsSAbpls1HhYItAONqvLTkj4iegrU5SZHgJYaGRci6vKBpEbKmHN+/t n0zliABnflRdpDmfDJJvfh+41PIlen7wxzODt1qlCA7w8Bmy0SNj0EFvYoF99EPz ypPt/GmFLTUsDsGtCYCJVTirsYHvWvx3s0rRT5hUC/E3UOPNDwVC/gpwcz85A3Tb 0j+XpbMcUGieMMXAFEoi =KGCa -----END PGP SIGNATURE----- |
From: Artyom <art...@ya...> - 2010-10-27 21:28:13
|
> > I know that, but I also surrounded it with a fitting try ... catch block > and it did happen nevertheless. I did also, on x86 - works well. So give the exact code please. If it crashes I think you compile against different version of CppCMS that you actually link with and you get crash as they ABI incompatible. Artyom |
From: Julian P. <ju...@wh...> - 2010-10-27 23:28:37
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 27.10.2010 23:28, schrieb Artyom: > I did also, on x86 - works well. > So give the exact code please. > > If it crashes I think you compile against different version > of CppCMS that you actually link with and you get crash as they ABI > incompatible. > I set up a clean Debian Lenny chroot, installed cppcms' dependencies and built the test program again, it works without problems there. So your guess seems to be right, I am probably linking to an old libcppcms version, but the problem is, I don't know why. Do I have to remove the old libcppcms.so before running make install to install the new libcppcms? I have no other copy of this library in the chroot than in /usr/lib, and this is overwritten on make install, at least that is what cmake outputs. So there seems to be no possibility that my program links against an old release of libcppcms accidentially. Are the header files updated reliably, too? Or should I try and remove them from my /usr/include? Additionally, I seem to have this problem on the two of my build chroots - - the one I use for ARM builds and the one I use for my 32bit x86 builds. I will further investigate into this problem and try to clean my build root and report back. Julian -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJMyLWlAAoJENidYKvYQHlQJsQP/128jMHL4PTl6Zt4Dz0Wp9wS iWHiFye0pcfg4npF2QqpbG8Mtt3fw/82fgtfcSX4mkmu/5gd7vYyZeKZVZObwvKa F/AsS+t7trLLaivS6o/TSCjh7fXVwijdMkr2wdS93o2AWjV6syzRiR+/JWIBl+Fx mnWzYCKueFhKOBJztEfCmu9YsHrxZb+h91V66bO1AQZa7j0ORyU7Is9zPr4kU4so NUfHRsARvCoiVbz9U0hI7oYQCTeTbg0K2eLrlR8lBluBdztK1aj2ShOSk3EsSqWu Fs4oCTptoSFMM3IiXwPT3HOhOjpRdjlqUapse6ViGcYYK1xo2kOkRLXadw6uayyU 15qazUfmathX7G4dXhNDQtjvNm5mujp1Vxwh3fkKY8wfU4cGvk2b7JsJWooyYQz9 0U05XetL0AaLctzCdj2wZL35Oj2Y0NH8QiUVGPF03Vw/ilP6fIyl+juwLvycVnJB 24+oYSIwGLH46Kdez31fe5uMPxN4ejfN8EP1VcY14aWFPfbMDWXfUBadQFQfFJHz phB7w4HWJfWMM5vFB9S2nBwTZv8VQfFobqHFDEvXDm5AIylJyVmAjLivV4ahtiz9 kEOcQmbUwm0++HCE/J2m8tn9eFht3Rt3lqcBLn6vZLuIc2xOJNY3SHMyH0PQSPUl CjDMM11l+HVDhYsXJSof =vQod -----END PGP SIGNATURE----- |
From: Artyom <art...@ya...> - 2010-10-28 05:44:13
|
> I set up a clean Debian Lenny chroot, installed cppcms' dependencies and > built the test program again, it works without problems there. > So your guess seems to be right, I am probably linking to an old > libcppcms version, but the problem is, I don't know why. Do I have to > remove the old libcppcms.so before running make install to install the > new libcppcms? I have no other copy of this library in the chroot than > in /usr/lib, and this is overwritten on make install, at least that is > what cmake outputs. So there seems to be no possibility that my program > links against an old release of libcppcms accidentially. Are the header > files updated reliably, too? Or should I try and remove them from my > /usr/include? > Additionally, I seem to have this problem on the two of my build chroots > - - the one I use for ARM builds and the one I use for my 32bit x86 builds. > I will further investigate into this problem and try to clean my build > root and report back. > > Julian Several notes: 1. Make sure that you hadn't installed a thing in /usr/local/(include|lib) and you always install in same prefix. 2. Generally make install solves problem, but if you have issues, remove following: $PREFIX/bin/cppcms_* $PREFIX/lib/libcppcms* $PREFIX/lib/libbooster* $PREFIX/include/cppcms/ $PREFIX/include/booster/ Where $PREFIX is installation prefix that is /usr or /usr/local or any other location. Generally this is not required, but if you have problems, this is good thing to do. 3. Then build clean version of CppCMS (remove build directory and run full build) 4. Run make test and then make install 5. Fully rebuild your application - full clean and full build. Note: this is the way to uninstall things with generic CMake installation: http://www.cmake.org/Wiki/CMake_FAQ#Can_I_do_.22make_uninstall.22_with_CMake.3F Regards, Artyom |