From: Ashton <ash...@pa...> - 2005-03-30 23:47:11
|
Hi, I have a program that loads dynamically/shared libraries at boot time (and can also be done during it's life). The problem shows itself after a call to exit() (but only sometimes). System details: Gentoo Linux i686 The kernel is 2.6.7 on an AMD Duron 1300Mhz glibc version 2.3.4 ld version 2.15 gcc version 3.3.4 valgrind version 2.4.0 Here is what I do know about the code: - The crash always (when it occurs) seems to happen after a call to exit(). - On loading of the libraries (dlopen etc) I make/add to a linked list so I can trace through them. See below for report from valgrind. - It does not happen every time (which makes me think I'm doing something very stupid). - I'm compiling the shared libraries with 'gcc -fPIC -shared -Wl,-soname' as the options and linking them in to the main binary (the program that loads these) with the gcc option -rdynamic. - If I start it with --db-attach=yes, the debugger attaches to the function '_dl_rtld_di_serinfo()' upon the first error, which is also in the gdb backtrace below. - Lastly, the library that was opened first is the one that always 'seems' to cause problems in the end. So basically the program starts, reads in a text file of the file name, handle and such, and then loads them up. During the runtime, it is possible to load new modules, or unload others that were already loaded. But when I exit the program with a call to exit() it (sometimes) segfaults. A GDB backtrace looks something like this (there have been more than one): (gdb) bt #0 0x1bb9148b in __do_global_dtors_aux () from ../bin/lib/test_special.so #1 0x1bb915c6 in _fini () from ../bin/lib/test_special.so #2 0x1b8efcdd in _dl_rtld_di_serinfo () from /lib/ld-linux.so.2 #3 0x1b98c410 in exit () from /lib/libc.so.6 #4 0x080568dc in init_socket (port=5001) at comm.c:307 #5 0x08056794 in init_game (port=3) at comm.c:238 #6 0x1b976488 in __libc_start_main () from /lib/libc.so.6 #7 0x08056381 in call_gmon_start () Right before the crash, Valgrind writes to the log file: ==31369== ==31369== Invalid read of size 4 ==31369== at 0x1BB9148B: (within /mud/fs/fs.test/bin/lib/test_special.so) ==31369== by 0x1BB915C5: (within /mud/fs/fs.test/bin/lib/test_special.so) ==31369== by 0x1B8EFCDC: (within /lib/ld-2.3.4.so) ==31369== by 0x1B98C40F: exit (in /lib/libc-2.3.4.so) ==31369== by 0x80568DB: init_game (comm.c:280) ==31369== by 0x8056793: main (comm.c:228) ==31369== Address 0x15DC is not stack'd, malloc'd or (recently) free'd ==31369== ==31369== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==31369== Access not within mapped region at address 0x15DC ==31369== at 0x1BB9148B: (within /mud/fs/fs.test/bin/lib/test_special.so) ==31369== by 0x1BB915C5: (within /mud/fs/fs.test/bin/lib/test_special.so) ==31369== by 0x1B8EFCDC: (within /lib/ld-2.3.4.so) ==31369== by 0x1B98C40F: exit (in /lib/libc-2.3.4.so) ==31369== by 0x80568DB: init_game (comm.c:280) ==31369== by 0x8056793: main (comm.c:228) And when the module is loaded on the program starting: ==31369== TRANSLATE: 0x1B9C2A50 redirected to 0x1B904D09 ==31369== Reading syms from /mud/fs/fs.test/bin/lib/test_special.so (0x1BB91000) ==31369== ==31369== Conditional jump or move depends on uninitialised value(s) ==31369== at 0x1B8ECCCA: (within /lib/ld-2.3.4.so) ==31369== by 0x1BA471E2: (within /lib/libc-2.3.4.so) ==31369== by 0x1B8EF735: (within /lib/ld-2.3.4.so) ==31369== by 0x1BA47526: _dl_open (in /lib/libc-2.3.4.so) ==31369== by 0x1B95DF6A: (within /lib/libdl-2.3.4.so) ==31369== by 0x1B8EF735: (within /lib/ld-2.3.4.so) ==31369== by 0x1B95E480: (within /lib/libdl-2.3.4.so) ==31369== by 0x1B95DFB3: dlopen (in /lib/libdl-2.3.4.so) ==31369== by 0x80BAC11: assign_dynamic_procs (spec_procs.c:201) ==31369== by 0x808AE4B: boot_db (db.c:447) ==31369== by 0x805680B: init_game (comm.c:247) ==31369== by 0x8056793: main (comm.c:228) The library does no allocation (malloc etc) whatsoever. I even tried a simple 'return 0' in it and it still happens at times. Admittedly, I have no idea what the functions in frame 0 and frame 2 are or what they do. Does anyone have an idea of what I'm missing ? While this only happens at exit, it still annoys me, because certainly something is wrong ? And one more thing that might be of interest. I don't have a trace of this with valgrind (was using dmalloc at the time though it's happened several times) is another (same ?) bug that shows up on after a call to gethostbyaddr: (gdb) bt #0 0x4001180a in realloc () from /lib/ld-linux.so.2 #1 0x4014af11 in getutmpx () from /lib/libc.so.6 #2 0x4000b686 in _dl_rtld_di_serinfo () from /lib/ld-linux.so.2 #3 0x4014b857 in _dl_open () from /lib/libc.so.6 #4 0x4014cb23 in _dl_mcount_wrapper_check () from /lib/libc.so.6 #5 0x4000b686 in _dl_rtld_di_serinfo () from /lib/ld-linux.so.2 #6 0x4014cad5 in _dl_mcount_wrapper_check () from /lib/libc.so.6 #7 0x4014cbeb in __libc_dlopen_mode () from /lib/libc.so.6 #8 0x4012b435 in __nss_lookup_function () from /lib/libc.so.6 #9 0x4012afff in __nss_database_lookup () from /lib/libc.so.6 #10 0x4012cc37 in __nss_hosts_lookup () from /lib/libc.so.6 #11 0x4012df0b in gethostbyaddr_r () from /lib/libc.so.6 #12 0x4012dc85 in gethostbyaddr () from /lib/libc.so.6 #13 0x08056b07 in new_descriptor (s=3) at comm.c:1192 #14 0x080550f1 in game_loop (mother_desc=3) at comm.c:490 #15 0x08054c4c in init_game (port=5000) at comm.c:234 #16 0x08054bd9 in main (argc=3, argv=0xbffff7b4) at comm.c:207 Think that's all. If I missed anything, I'll be happy to let you know more. Any help or even a point in the right direction would be appreciated. |