From: Michael A. P. <mp...@ma...> - 2005-09-12 22:21:36
Attachments:
failure.txt
test.gnuplot
|
I think I have found a bug in gnuplot cvs (or a bug in xorg-x11??) The problem only seems to occur in the x11 terminal. The pslatex terminal (what I use most) works fine. The png terminal also works fine. any use of the x11 terminal fails. I have cc'd the fedora maintainer of gnuplot just in case he might know what the issue is. The failure: http://mpeters.us/gnuplot/failure.txt A script that demonstrates failure: http://mpeters.us/gnuplot/test.gnuplot (both those are attached to this message as well) config log from building gnuplot: http://mpeters.us/gnuplot/config.log Failure also happens when I specify export GNUPLOT_DRIVER_DIR=/opt/gnuplot/libexec/gnuplot/4.1 (the path to where gnuplot_x11 is installed) Platform: Fedora Core 4 (x86 32 bit, gcc 4.0.1) Build Environment: I'm building gnuplot as an rpm in mock (a build system that builds an rpm inside a clean chroot, it is the standard way for building packages for Fedora Extras - see http://fedoraproject.org/wiki/Projects/Mock ) The spec file I am using is based on the distribution provided Fedora Core 4 spec file. Differences are I changed the install location to /opt/gnuplot and the package name so that I wouldn't have conflicts etc. with my distros packaging of gnuplot 4.0, and of course other needed changes for building CVS version (such as BuildRequires gd-devel) The spec file I am using: http://mpeters.us/gnuplot/gnuplot-cvs.spec The src.rpm I am using: http://mpeters.us/gnuplot/gnuplot-cvs-4.1-0.cvs.20050912.fc4.src.rpm (that includes tarball of the cvs co used - date 20050912) Binary rpm: http://mpeters.us/gnuplot/gnuplot-cvs-4.1-0.cvs.20050912.fc4.i386.rpm If it is of use, here is the build log from the mock build: http://mpeters.us/gnuplot/build.log -=- Probably not needed, but: Hardware details of my headless build machine: [mpeters@utility ~]$ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 8 model name : AMD Duron(tm) processor stepping : 1 cpu MHz : 1596.367 cache size : 64 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow bogomips : 3162.11 Hardware details of machine I'm running it on (Thinkpad T20): [mpeters@laptop ~]$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping : 3 cpu MHz : 547.679 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 1081.34 |
From: Ethan M. <merritt@u.washington.edu> - 2005-09-12 22:53:44
|
On Monday 12 September 2005 03:20 pm, Michael A. Peters wrote: [mpeters@laptop upload]$ /opt/gnuplot/bin/gnuplot test.gnuplot *** buffer overflow detected ***: gnuplot_x11 terminated ======= Backtrace: ========= /lib/libc.so.6(__chk_fail+0x41)[0x586c45] gnuplot_x11[0x804aaa2] gnuplot_x11[0x80576b6] gnuplot_x11[0x8059319] Please send me the output from readelf -s /opt/gnuplot/libexec/gnuplot/4.1/gnuplot_x11 That should hopefully give a clue as to what nested routines that stack trace corresponds to. Ethan -- Ethan A Merritt merritt@u.washington.edu Biomolecular Structure Center Mailstop 357742 University of Washington, Seattle, WA 98195 |
From: Michael A. P. <mp...@ma...> - 2005-09-12 23:44:45
Attachments:
gnuplot_x11.output
|
On Mon, 2005-09-12 at 15:53 -0700, Ethan Merritt wrote: > Please send me the output from > readelf -s /opt/gnuplot/libexec/gnuplot/4.1/gnuplot_x11 > > That should hopefully give a clue as to what nested routines > that stack trace corresponds to. attached as gnuplot_x11.output |
From: Michael A. P. <mp...@ma...> - 2005-09-19 03:42:41
Attachments:
gnuplot-bug2.txt
|
Is this more revealing? I ran it inside of gdb WITH the debugging symbols intact, as suggested by http://sourceforge.net/mailarchive/forum.php?thread_id=8181544&forum_id=37059 If it is not, what can I do to make it more revealing? attached |
From: Ethan A M. <merritt@u.washington.edu> - 2005-09-19 04:10:49
|
On Sunday 18 September 2005 08:41 pm, you wrote: > Is this more revealing? Well, no. That's a trace of gnuplot itself. But it's the auxilliary program gnuplot_x11 that is actually crashing. So all this trace tells us is that gnuplot is unhappy that gnuplot_x11 crashed. > If it is not, what can I do to make it more revealing? Here is what you can do to isolate gnuplot_x11, and trace its behaviour. First save the stream of commands that would normally go from gnuplot to gnuplot_x11. Put them in a separate file: gnuplot> set term xlib gnuplot> set output "bug.x" gnuplot> plot x,x gnuplot> quit Now feed that file to gnuplot_x11 to confirm the problem: gnuplot_x11 --noevents < bug.x Assuming that still exhibits the problem, run it again using the valgrind toolset: valgrind --tool=addrcheck --leak-check=yes --leak-resolution=med \ --log-file=valgrind ./gnuplot_x11 < bug.x That should produce a log file "valgrind.pid????", where ???? is the process number, that contains an analysis of various sorts of memory problems, illegal accesses, and so on. I warn you that I have done this on several machines (but not on Fedora Core 4 because I don't have a machine set up with it). Under Mandrake 10.0, Valgrind finds seveal leaks and unitialized memory accesses inside the XFree86 libraries themselves, but nothing of note in gnuplot_x11. Under Mandrake 10.1, which uses X.org rather than XFree86, valgrind finds no errors of note. So I'm inclinded to guess that you have either an X error, or a compiler error. As I understand it, Fedora 4 ships with gcc 4.something_bleeding_edge. Depending on what valgrind finds or doesn't find, you might want to try installing gcc 3.4 or whatever Fedora provides as a fallback. If that fixes it, then report it through Redhat as a compiler bug. Otherwise, I'd be very interested to hear what valgrind finds. -- Ethan A Merritt Biomolecular Structure Center University of Washington, Seattle 98195-7742 |
From: Michael A. P. <mp...@ma...> - 2005-09-19 07:16:40
|
On Sun, 2005-09-18 at 21:10 -0700, Ethan A Merritt wrote: > On Sunday 18 September 2005 08:41 pm, you wrote: > > Is this more revealing? > > Well, no. > That's a trace of gnuplot itself. But it's the auxilliary program > gnuplot_x11 that is actually crashing. So all this trace tells us is that > gnuplot is unhappy that gnuplot_x11 crashed. > > > If it is not, what can I do to make it more revealing? > > Here is what you can do to isolate gnuplot_x11, and trace > its behaviour. > > First save the stream of commands that would normally go from > gnuplot to gnuplot_x11. Put them in a separate file: > > gnuplot> set term xlib Ouch - that results in a problem too. gnuplot> set term xlib Terminal type set to 'xlib' Options are '0' gnuplot> *** buffer overflow detected ***: gnuplot_x11 terminated ======= Backtrace: ========= /lib/libc.so.6(__chk_fail+0x41)[0x1eec45] gnuplot_x11[0x804aaa2] gnuplot_x11[0x80576b6] gnuplot_x11[0x8059319] /lib/libc.so.6(__libc_start_main+0xdf)[0x125d5f] gnuplot_x11[0x804a851] ======= Memory map: ======== 00111000-00234000 r-xp 00000000 fd:00 1637568 /lib/libc-2.3.5.so 00234000-00236000 r-xp 00123000 fd:00 1637568 /lib/libc-2.3.5.so 00236000-00238000 rwxp 00125000 fd:00 1637568 /lib/libc-2.3.5.so 00238000-0023a000 rwxp 00238000 00:00 0 0038e000-00397000 r-xp 00000000 fd:00 1637574 /lib/libgcc_s-4.0.1-20050727.so .1 00397000-00398000 rwxp 00009000 fd:00 1637574 /lib/libgcc_s-4.0.1-20050727.so .1 0048b000-004a5000 r-xp 00000000 fd:00 1637567 /lib/ld-2.3.5.so 004a5000-004a6000 r-xp 00019000 fd:00 1637567 /lib/ld-2.3.5.so 004a6000-004a7000 rwxp 0001a000 fd:00 1637567 /lib/ld-2.3.5.so 00595000-00596000 r-xp 00595000 00:00 0 005d4000-005f7000 r-xp 00000000 fd:00 1637569 /lib/libm-2.3.5.so 005f7000-005f8000 r-xp 00022000 fd:00 1637569 /lib/libm-2.3.5.so 005f8000-005f9000 rwxp 00023000 fd:00 1637569 /lib/libm-2.3.5.so 005fb000-005fd000 r-xp 00000000 fd:00 1637570 /lib/libdl-2.3.5.so 005fd000-005fe000 r-xp 00001000 fd:00 1637570 /lib/libdl-2.3.5.so 005fe000-005ff000 rwxp 00002000 fd:00 1637570 /lib/libdl-2.3.5.so 00616000-006e6000 r-xp 00000000 fd:00 1283972 /usr/X11R6/lib/libX11.so.6.2 006e6000-006ea000 rwxp 000cf000 fd:00 1283972 /usr/X11R6/lib/libX11.so.6.2 006ec000-006fa000 r-xp 00000000 fd:00 1284197 /usr/X11R6/lib/libXext.so.6.4 006fa000-006fb000 rwxp 0000e000 fd:00 1284197 /usr/X11R6/lib/libXext.so.6.4 008eb000-008f2000 r-xp 00000000 fd:00 1284362 /usr/X11R6/lib/libXrender.so.1. 2.2 008f2000-008f3000 rwxp 00007000 fd:00 1284362 /usr/X11R6/lib/libXrender.so.1. 2.2 00c89000-00c92000 r-xp 00000000 fd:00 1286027 /usr/X11R6/lib/libXcursor.so.1. 0.2 00c92000-00c93000 rwxp 00008000 fd:00 1286027 /usr/X11R6/lib/libXcursor.so.1. 0.2 00d02000-00d03000 r-xp 00000000 fd:00 1441346 /usr/X11R6/lib/X11/locale/lib/c ommon/xlcUTF8Load.so.2 00d03000-00d04000 rwxp 00000000 fd:00 1441346 /usr/X11R6/lib/X11/locale/lib/c ommon/xlcUTF8Load.so.2 08048000-0805e000 r-xp 00000000 fd:00 196093 /opt/gnuplot/libexec/gnuplot/4. 1/gnuplot_x11 0805e000-08060000 rw-p 00015000 fd:00 196093 /opt/gnuplot/libexec/gnuplot/4. 1/gnuplot_x11 08060000-08063000 rw-p 08060000 00:00 0 0885d000-08893000 rw-p 0885d000 00:00 0 [heap] b7d7b000-b7d81000 r--s 00000000 fd:00 1339777 /usr/lib/gconv/gconv-modules.ca che b7d81000-b7f81000 r--p 00000000 fd:00 1285084 /usr/lib/locale/locale-archive b7f81000-b7f83000 rw-p b7f81000 00:00 0 bf879000-bf88e000 rw-p bf879000 00:00 0 [stack] gnuplot> -=- I'm going to ask for help on the Fedora Devel list, it very well may be a gcc 4.0.1 problem. Hopefully a fedora developer who knows a little bit more about tracing these kind of issues will be able to figure out how to isolate the issue. There was a huge bug in xorg-x11 that was caused by a bug in gcc that took awhile for the gcc team to acknowledge was their bug, I will compile using an older version of gcc and see if that works out better. Thanks for your time, and thanks for the effort on gnuplot. I don't qualify for student version of Mathematica - and I certainly can't afford the non student version, gnuplot has met my needs exceedingly well (pslatex is virtually all I use, this bug - yours or gcc, doesn't really affect me) |
From: Hans-Bernhard B. <br...@ph...> - 2005-09-19 08:33:56
|
Ethan A Merritt wrote: > On Sunday 18 September 2005 08:41 pm, you wrote: > >>Is this more revealing? > Well, no. > That's a trace of gnuplot itself. But it's the auxilliary program > gnuplot_x11 that is actually crashing. So all this trace tells us is that > gnuplot is unhappy that gnuplot_x11 crashed. Exactly. To avoid that, it would be necessary to have gdb start debugging gnuplot_x11 as soon as it's executed by fork()ing gnuplot. There's a "set follow-fork-mode" command in gdb for that. |
From: Ethan A M. <merritt@u.washington.edu> - 2005-09-13 03:23:59
|
On Monday 12 September 2005 05:46 pm, you wrote: > > > > readelf -s -W /opt/gnuplot/libexec/gnuplot/4.1/gnuplot_x11 OK. The stack trace is: *** buffer overflow detected ***: gnuplot_x11 terminated ======= Backtrace: ========= /lib/libc.so.6(__chk_fail+0x41)[0x586c45] gnuplot_x11[0x804aaa2] Find_Plot_In_Linked_List_By_Number gnuplot_x11[0x80576b6] mainloop (gplt_x11.c line 1326, I think) gnuplot_x11[0x8059319] main I can't see how Find_Plot_In_Linked_List_By_Number() would ever cause a buffer overflow, but all eyes are invited to stare at the code. I wonder could it possibly be a malloc() problem? -- Ethan A Merritt Biomolecular Structure Center University of Washington, Seattle 98195-7742 |
From: Hans-Bernhard B. <br...@ph...> - 2005-09-13 08:02:51
|
Ethan A Merritt wrote: > /lib/libc.so.6(__chk_fail+0x41)[0x586c45] > gnuplot_x11[0x804aaa2] Find_Plot_In_Linked_List_By_Number > gnuplot_x11[0x80576b6] mainloop (gplt_x11.c line 1326, I think) That traceback appears garbled. mainloop() doesn't call Find_..._By_Number() --- it only does so indirectly, through record(), which definitely is too complex for the compiler having dared to inline it. > gnuplot_x11[0x8059319] main > I can't see how Find_Plot_In_Linked_List_By_Number() would ever > cause a buffer overflow, Well, it's operating on a linked list. Linked list manipulation *can* go wrong, e.g. killing the NULL pointer at the end, or leaving behind nodes that have been free()d, but not removed from the list. I'm not at all convinced that this __chk_fail stuff is able to distinguish genuine buffer overflows from other rogue pointers. As I told the OP before, I'm quite sure that this can't be resolved meaningfully without running the test inside a debugger, where data structures can be inspected. |