From: <don...@is...> - 2009-12-04 22:48:40
|
What could cause this: Break 1 AP5[7]> (sss:sstream *) #<IO INPUT-BUFFERED SOCKET-STREAM (UNSIGNED-BYTE 8) 0.0.0.0:5210> Break 1 AP5[7]> (close *) *** - UNIX error 9 (EBADF): Bad file number The following restarts are available: ... Also what can I do about it? Unfortunately I don't have the object to experiment with any more since shortly after I saved it and tried to resume I got this: READ-BYTE on #<CLOSED INPUT BUFFERED SOCKET-STREAM (UNSIGNED-BYTE 8) 0.0.0.0:5210> is illegal *** - handle_fault error2 ! address = 0x2034480 not in [0x21c94004,0x22616f84) ! SIGSEGV cannot be cured. Fault address = 0x2034480. GC count: 3640 Space collected by GC: 3 1800020696 Run time: 1265 287075 Real time: 790495 60114 GC time: 56 515516 Permanently allocated: 111880 bytes. Currently in use: 18277496 bytes. Free space: 2442400 bytes. Segmentation fault |
From: Sam S. <sd...@gn...> - 2009-12-05 23:35:19
|
> * Don Cohen <qba...@vf...3-vap.pbz> [2009-12-04 14:48:35 -0800]: > > What could cause this: > Break 1 AP5[7]> (sss:sstream *) > #<IO INPUT-BUFFERED SOCKET-STREAM (UNSIGNED-BYTE 8) 0.0.0.0:5210> > Break 1 AP5[7]> (close *) > > *** - UNIX error 9 (EBADF): Bad file number > The following restarts are available: > ... > > Also what can I do about it? I think EBADF means that the fd has already been closed. how did you create this stream? what did you do with it? can you reproduce this error? -- Sam Steingold (http://sds.podval.org/) on Ubuntu 9.04 (jaunty) http://pmw.org.il http://memri.org http://truepeace.org http://jihadwatch.org http://palestinefacts.org http://dhimmi.com All generalizations are wrong. Including this. |
From: <don...@is...> - 2009-12-06 21:40:49
|
Sam Steingold writes: > > What could cause this: > > Break 1 AP5[7]> (sss:sstream *) > > #<IO INPUT-BUFFERED SOCKET-STREAM (UNSIGNED-BYTE 8) 0.0.0.0:5210> > > Break 1 AP5[7]> (close *) > > *** - UNIX error 9 (EBADF): Bad file number > > The following restarts are available: > > ... > I think EBADF means that the fd has already been closed. But then I expect the stream to print as "closed" - as in this: [1]> (setf socket (SOCKET:SOCKET-CONNECT 80 "google.com")) #<IO INPUT-BUFFERED SOCKET-STREAM CHARACTER google.com:80> [2]> (close socket) T [3]> socket #<CLOSED IO INPUT-BUFFERED SOCKET-STREAM CHARACTER google.com:80> [4]> > how did you create this stream? what did you do with it? > can you reproduce this error? It was created by accepting a connection, essentially like this: [1]> (setf server (socket:SOCKET-SERVER 1234)) #<SOCKET-SERVER 0.0.0.0:1234> [2]> (setf stream (socket:SOCKET-ACCEPT server)) #<IO INPUT-BUFFERED SOCKET-STREAM CHARACTER 0.0.0.0:1234> The element type of the stream changes during operation. I don't yet know how to reproduce the error at will. I was hoping for some more info that would help me figure that out. Even worse, I don't know how to avoid producing it against my will, which is what I really want. |
From: Sam S. <sd...@gn...> - 2009-12-07 15:19:34
|
Don Cohen wrote: > Sam Steingold writes: > > > What could cause this: > > > Break 1 AP5[7]> (sss:sstream *) > > > #<IO INPUT-BUFFERED SOCKET-STREAM (UNSIGNED-BYTE 8) 0.0.0.0:5210> > > > Break 1 AP5[7]> (close *) > > > *** - UNIX error 9 (EBADF): Bad file number > > > The following restarts are available: > > > ... > > > I think EBADF means that the fd has already been closed. > But then I expect the stream to print as "closed" - as in this: yes, if it has been closed by clisp. imagine two streams sharing a socket. then closing one of them will make the other one un-close-able. this, however, should never happen as we dup() the fds as necessary. > I don't yet know how to reproduce the error at will. I was hoping for > some more info that would help me figure that out. > Even worse, I don't know how to avoid producing it against my will, > which is what I really want. the only way there is to reproduce it and fix it. |
From: <don...@is...> - 2009-12-07 15:44:33
|
Sam Steingold writes: > > > I think EBADF means that the fd has already been closed. > > But then I expect the stream to print as "closed" - as in this: > yes, if it has been closed by clisp. > imagine two streams sharing a socket. > then closing one of them will make the other one un-close-able. > this, however, should never happen as we dup() the fds as necessary. Is there a way to test whether it has been closed (either by clisp or otherwise)? Or a way to distinguish an ebadf error from any other? (I don't have the error object to examine at the moment.) |
From: Sam S. <sd...@gn...> - 2009-12-07 16:33:46
|
Don Cohen wrote: > Sam Steingold writes: > > > > I think EBADF means that the fd has already been closed. > > > But then I expect the stream to print as "closed" - as in this: > > yes, if it has been closed by clisp. > > imagine two streams sharing a socket. > > then closing one of them will make the other one un-close-able. > > this, however, should never happen as we dup() the fds as necessary. > Is there a way to test whether it has been closed (either by clisp or fcntl(fd,F_GETFL,0) see stream.d:handle_direction_compatible > otherwise)? Or a way to distinguish an ebadf error from any other? examine errno |
From: <don...@is...> - 2009-12-10 02:59:06
|
Sam Steingold writes: > Don Cohen wrote: > > Sam Steingold writes: > > > > > I think EBADF means that the fd has already been closed. > > > > But then I expect the stream to print as "closed" - as in this: > > > yes, if it has been closed by clisp. > > > imagine two streams sharing a socket. > > > then closing one of them will make the other one un-close-able. > > > this, however, should never happen as we dup() the fds as necessary. > > Is there a way to test whether it has been closed (either by clisp or > fcntl(fd,F_GETFL,0) > see stream.d:handle_direction_compatible > > otherwise)? Or a way to distinguish an ebadf error from any other? > examine errno Does any of this help? *** - UNIX error 9 (EBADF): Bad file number ... Break 1 AP5[12]> where <1/492> #<SYSTEM-FUNCTION EXT:READ-BYTE-SEQUENCE> [486] EVAL frame for form (EXT:READ-BYTE-SEQUENCE SSS::*BYTE-IO-VECTOR* (SSS:SSTREAM SSS::C) :NO-HANG T :END SSS::LIMIT) ... Break 3 AP5[16]> (multiple-value-list (ignore-errors (EXT:READ-BYTE-SEQUENCE SS\ S::*BYTE-IO-VECTOR* (SSS:SSTREAM SSS::C) :NO-HANG T :END SSS::LIMIT))) (NIL #<SYSTEM::SIMPLE-STREAM-ERROR #x20DAB02E>) Break 3 AP5[16]> (setf e (cadr *)) #<SYSTEM::SIMPLE-STREAM-ERROR #x20DAB02E> Break 3 AP5[16]> (describe e) #<SYSTEM::SIMPLE-STREAM-ERROR #x20DAB02E> is an instance of the CLOS class #1=#<STANDARD-CLASS SYSTEM::SIMPLE-STREAM-ERROR>. Slots: SYSTEM::$STREAM = #<INPUT BUFFERED SOCKET-STREAM (UNSIGNED-BYTE 8) 0.0.0.0:5210> SYSTEM::$FORMAT-CONTROL = "UNIX error ~S (EBADF): Bad file number " SYSTEM::$FORMAT-ARGUMENTS = (9) #<INPUT BUFFERED SOCKET-STREAM (UNSIGNED-BYTE 8) 0.0.0.0:5210> is an input-stream. "UNIX error ~S (EBADF): Bad file number " is a simple 1 dimensional array (vector) of CHARACTERs, of size 39 (a ISO-8859-1 string). (9) is a list of length 1. Break 3 AP5[16]> (apropos "ERRNO") POSIX:ERRNO function Break 3 AP5[16]> (POSIX:ERRNO) :EINVAL ... *** - handle_fault error2 ! address = 0x1f5d170 not in [0x20633004,0x20c2e3b4) \ ! SIGSEGV cannot be cured. Fault address = 0x1f5d170. What would you like me to try the next time? |
From: Sam S. <sd...@gn...> - 2009-12-10 14:56:47
|
Don Cohen wrote: > > Does any of this help? not really. unless you can get a reproducible test case, there is very little I can do. however, you can run under gdb and put a breakpoint in low_close_handle: break stream.d:4527 commands print fd continue end and gdb will print a message whenever it closes an FD. however, FDs are reused, so you need to also print it when it is created, i.e., break make_socket_stream commands print handle continue end note that this will only serve to test my conjecture that the socket is closed twice. if it turns out to be correct, we would still have to hunt down the bug, which might not be easy. a reproducible test case would help :-) however, this is not necessarily the case - my unix expertize is insufficient. I urge you to discuss this on a unix forum (e.g., http://groups.google.com/group/comp.unix.programmer/) and post the link to the thread (or, better yet, the summary thereof) here. > #<INPUT BUFFERED SOCKET-STREAM (UNSIGNED-BYTE 8) 0.0.0.0:5210> why is host = 0.0.0.0? is this normal? > Break 3 AP5[16]> (POSIX:ERRNO) I meant the C variable. clisp has already told you this is a ebadf |
From: <don...@is...> - 2009-12-10 18:06:25
|
Sam Steingold writes: > however, you can run under gdb and put a breakpoint in low_close_handle: > break stream.d:4527 > commands > print fd > continue > end > > and gdb will print a message whenever it closes an FD. > however, FDs are reused, so you need to also print it when it is created, i.e., > > break make_socket_stream > commands > print handle > continue > end I'm afraid I need more help in running gdb. The first problem is that when I do above and then run I get Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 2336992 (LWP 29957)] 0x080b76ba in closed_buffered (stream=0x21457166) at ../src/stream.d:8326 8326 BufferedStream_channel(stream) = NIL; /* Handle becomes invalid */ (gdb) My guess is that I need the stuff in .gdbinit, but when I copy that into the directory where I'm trying to run I see .gdbinit:203: Error in sourced command file: No symbol "byteptr" in current context. (gdb) which seems to come from the end of .gdbinit # cut and paste when you stop in interpret_bytecode_() This leads me to believe that I shouldn't be loading that. But just above I see #ifdef MULTITHREAD # you need this when debugging multithreaded CLISP # because SIGUSR1 is used by WITH-TIMEOUT #handle SIGUSR1 noprint nostop #end Should I include the handle and end lines? I tried uncommenting those, removing the cut and paste section at the end and adding your commands above. (gdb) run Starting program: /tmp/ap5-2.48+MT Reading symbols from shared object read from target memory...done. Loaded system supplied DSO at 0x6dc000 [Thread debugging using libthread_db enabled] [New Thread 1120480 (LWP 30061)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1120480 (LWP 30061)] 0x080b76ba in closed_buffered (stream=0x21a5e166) at ../src/stream.d:8326 8326 BufferedStream_channel(stream) = NIL; /* Handle becomes invalid */ (gdb) What should I do to get the printing you describe but not break? > > #<INPUT BUFFERED SOCKET-STREAM (UNSIGNED-BYTE 8) 0.0.0.0:5210> > why is host = 0.0.0.0? > is this normal? I thought it was odd too. I guessed that it meant that I didn't specify an address for the the socket server. |
From: Sam S. <sd...@gn...> - 2009-12-10 18:32:10
|
Don Cohen wrote: > Sam Steingold writes: > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 2336992 (LWP 29957)] > 0x080b76ba in closed_buffered (stream=0x21457166) at > ../src/stream.d:8326 > 8326 BufferedStream_channel(stream) = NIL; /* Handle becomes > invalid */ if you have generational GC enabled (test with "clisp --version") you need break sigsegv_handler_failed handle SIGSEGV noprint nostop handle SIGBUS noprint nostop |
From: <don...@is...> - 2009-12-10 18:47:13
|
Sam Steingold writes: > Don Cohen wrote: > > Sam Steingold writes: > > Program received signal SIGSEGV, Segmentation fault. > > [Switching to Thread 2336992 (LWP 29957)] > > 0x080b76ba in closed_buffered (stream=0x21457166) at > > ../src/stream.d:8326 > > 8326 BufferedStream_channel(stream) = NIL; /* Handle becomes > > invalid */ > > if you have generational GC enabled (test with "clisp --version") > you need > break sigsegv_handler_failed > handle SIGSEGV noprint nostop > handle SIGBUS noprint nostop > > Breakpoint 2, low_close_handle (stream=0x216a9036, handle=0xc0000120, abort=0 '\0') at ../src/stream.d:4527 4527 begin_blocking_system_call(); No symbol "fd" in current context. (gdb) I have to rebuild with some debug flag? |
From: Sam S. <sd...@gn...> - 2009-12-10 19:12:51
|
Don Cohen wrote: > Sam Steingold writes: > > Don Cohen wrote: > > > Sam Steingold writes: > > > Program received signal SIGSEGV, Segmentation fault. > > > [Switching to Thread 2336992 (LWP 29957)] > > > 0x080b76ba in closed_buffered (stream=0x21457166) at > > > ../src/stream.d:8326 > > > 8326 BufferedStream_channel(stream) = NIL; /* Handle becomes > > > invalid */ > > > > if you have generational GC enabled (test with "clisp --version") > > you need > > break sigsegv_handler_failed > > handle SIGSEGV noprint nostop > > handle SIGBUS noprint nostop > > > > > Breakpoint 2, low_close_handle (stream=0x216a9036, handle=0xc0000120, > abort=0 '\0') at ../src/stream.d:4527 > 4527 begin_blocking_system_call(); > No symbol "fd" in current context. > (gdb) > > I have to rebuild with some debug flag? probably (configure --with-debug). did you already try to ask the unix people about what ebadf might mean? |
From: <don...@is...> - 2009-12-10 19:39:32
|
Sam Steingold writes: > probably (configure --with-debug). ok, might take a while BTW what cost does this have in time or space at run time? > did you already try to ask the unix people about what ebadf might mean? not yet, still trying to figure out what to ask It occurs to me that this might be related to threads. I've only started to run this code in mt and I've only seen it there. |
From: <don...@is...> - 2009-12-10 22:05:44
|
Don Cohen writes: > Sam Steingold writes: > > probably (configure --with-debug). > ok, might take a while before --with-debug: $ gdb /tmp/ap5-2.48+MT ... (gdb) break sigsegv_handler_failed Breakpoint 1 at 0x8071d9e: file ../src/spvw_sigsegv.d, line 64. (gdb) after: $ gdb /tmp/ap5-2.48+MT ... (gdb) break sigsegv_handler_failed Function "sigsegv_handler_failed" not defined. Make breakpoint pending on future shared library load? (y or [n]) Is this expected or at least reasonable? The rest seemed to go as I would have hoped. I see multiple closes of the same fd but these are files that I'm not seeing get opened. I run another process that connects to the server and get as expected: Breakpoint 3, make_socket_stream (handle=12, eltype=0x1da6e5c, buffered=BUFFERED_DEFAULT, host={one_o = 554374622}, port= {one_o = 3221392192}) at ../src/stream.d:13884 13884 pushSTACK(host); $62 = 12 then close it and see Breakpoint 2, low_close_handle (stream={one_o = 554464470}, handle= {one_o = 3221225856}, abort=0 '\0') at ../src/stream.d:4527 4527 begin_blocking_system_call(); $73 = 12 then start another Breakpoint 3, make_socket_stream (handle=12, eltype=0x1da6e5c, buffered=BUFFERED_DEFAULT, host={one_o = 554374622}, port= {one_o = 3221392192}) at ../src/stream.d:13884 13884 pushSTACK(host); $74 = 12 ... Program received signal SIGPIPE, Broken pipe. 0x006dc402 in __kernel_vsyscall () (gdb) continue Continuing. [../src/stream.d:13063] Is this where the sigpipe came from? How can I automatically continue it? handle SIGPIPE noprint nostop ? what others are likely to stop? Is it only the single thread that stops? It now occurs to me that this may not be enough data to see whether the same fd is closed more than once. At least the fd's are not enough, since I see later on ... ... Breakpoint 3, make_socket_stream (handle=14, eltype=0x1da6e5c, buffered=BUFFERED_DEFAULT, host={one_o = 554024606}, port= {one_o = 3221487872}) at ../src/stream.d:13884 13884 pushSTACK(host); $90 = 14 ... no other 14's Breakpoint 2, low_close_handle (stream={one_o = 554059598}, handle= {one_o = 3221225920}, abort=0 '\0') at ../src/stream.d:4527 4527 begin_blocking_system_call(); $106 = 14 [no output between previous and next] Breakpoint 2, low_close_handle (stream={one_o = 555392830}, handle= {one_o = 3221225920}, abort=0 '\0') at ../src/stream.d:4527 4527 begin_blocking_system_call(); $107 = 14 But the second close could be for a file that was opened after the first close. I can't tell from the args displayed above which close corresponds to which open. If the same fd is closed twice can I tell from the args shown in the close call? |
From: Sam S. <sd...@gn...> - 2009-12-14 23:43:13
|
Don Cohen wrote: > Don Cohen writes: > > Sam Steingold writes: > > > probably (configure --with-debug). > > ok, might take a while > before --with-debug: > > $ gdb /tmp/ap5-2.48+MT > ... > (gdb) break sigsegv_handler_failed > Breakpoint 1 at 0x8071d9e: file ../src/spvw_sigsegv.d, line 64. > (gdb) > after: > $ gdb /tmp/ap5-2.48+MT > ... > (gdb) break sigsegv_handler_failed > Function "sigsegv_handler_failed" not defined. > Make breakpoint pending on future shared library load? (y or [n]) > > Is this expected or at least reasonable? I think .gdbinit (and the source code) make is clear that sigsegv_handler_failed is only present when compiled with GENERATIONAL_GC and the FAQ says that --with-debug disables it. > I run another process that connects to the server and get as expected: > > Breakpoint 3, make_socket_stream (handle=12, eltype=0x1da6e5c, > buffered=BUFFERED_DEFAULT, host={one_o = 554374622}, port= > {one_o = 3221392192}) at ../src/stream.d:13884 > 13884 pushSTACK(host); > $62 = 12 > then close it and see > Breakpoint 2, low_close_handle (stream={one_o = 554464470}, handle= > {one_o = 3221225856}, abort=0 '\0') at ../src/stream.d:4527 > 4527 begin_blocking_system_call(); > $73 = 12 > then start another > Breakpoint 3, make_socket_stream (handle=12, eltype=0x1da6e5c, > buffered=BUFFERED_DEFAULT, host={one_o = 554374622}, port= > {one_o = 3221392192}) at ../src/stream.d:13884 > 13884 pushSTACK(host); > $74 = 12 > > ... > Program received signal SIGPIPE, Broken pipe. > 0x006dc402 in __kernel_vsyscall () > (gdb) continue > Continuing. > > [../src/stream.d:13063] > Is this where the sigpipe came from? > How can I automatically continue it? > handle SIGPIPE noprint nostop ? yes. > what others are likely to stop? > Is it only the single thread that stops? I think so. > It now occurs to me that this may not be enough data to see > whether the same fd is closed more than once. At least the fd's are > not enough, since I see later on ... > > ... > > Breakpoint 3, make_socket_stream (handle=14, eltype=0x1da6e5c, > buffered=BUFFERED_DEFAULT, host={one_o = 554024606}, port= > {one_o = 3221487872}) at ../src/stream.d:13884 > 13884 pushSTACK(host); > $90 = 14 > > ... no other 14's > > Breakpoint 2, low_close_handle (stream={one_o = 554059598}, handle= > {one_o = 3221225920}, abort=0 '\0') at ../src/stream.d:4527 > 4527 begin_blocking_system_call(); > $106 = 14 > > [no output between previous and next] > > Breakpoint 2, low_close_handle (stream={one_o = 555392830}, handle= > {one_o = 3221225920}, abort=0 '\0') at ../src/stream.d:4527 > 4527 begin_blocking_system_call(); > $107 = 14 > > But the second close could be for a file that was opened after the > first close. I can't tell from the args displayed above which close > corresponds to which open. If the same fd is closed twice can I > tell from the args shown in the close call? you are right. I think you need to stop in add_to_open_streams too. alas, it will not tell you what the handle number is. for that you need to add a breakpoint at allocate_handle |
From: <don...@is...> - 2009-12-18 09:22:29
|
I've been distracted from this problem lately but now trying to get back to it. Sam Steingold writes: > > Function "sigsegv_handler_failed" not defined. > > Make breakpoint pending on future shared library load? (y or [n]) > > > > Is this expected or at least reasonable? > > I think .gdbinit (and the source code) make is clear that > sigsegv_handler_failed is only present when compiled with > GENERATIONAL_GC and the FAQ says that --with-debug disables it. I don't understand what --with-debug disables and why. But since you don't seem too worried about it I guess I don't really have to know. I'm still building everything with debug but not actually setting any breakpoints - I find they make it hard for me to debug in the presence of multiple threads. I've not figured out exactly what's going on but the feel is similar to the typical case where you don't know what thread you're typing to, and in fact different characters on the same line end up being read by different threads. All of that is just background. Here's the important point: I notice that just before my ebadf is caught I see this: [../src/stream.d:6195] which I gather only appears in debug builds, since I don't see it in the output before I started building with debug. BTW, under what circumstances are such messages printed in general? That line of stream.d appears in the OS_filestream_error line of this: local maygc uintL low_fill_buffered_handle (object stream, perseverance_t persev) { ... GC_SAFE_SYSTEM_CALL(result=, fd_read(handle,buff,strm_buffered_bufflen,persev)); stream = popSTACK(); unpin_varobject(BufferedStream_buffer(stream)); if (result<0) /* error occurred? */ OS_filestream_error(stream); <========= this line if (result==0 && error_eof_p()) BufferedStream_have_eof_p(stream) = true; return result; } I also see evidence that the error is coming from EXT:READ-BYTE-SEQUENCE and that this function is returning NIL. My understanding is that it is supposed to return an integer. Note that this is not where I get the ebadf. I think that the return of NIL is causing an error that I catch and that I later try to close the stream, and that's what causes the ebadf. So perhaps the OS_filestream_error closes the stream? Where is fd_read defined and when does it return <0 ? |
From: Sam S. <sd...@gn...> - 2009-12-18 19:49:55
|
Don Cohen wrote: > > I'm still building everything with debug but not actually setting any > breakpoints - I find they make it hard for me to debug in the presence > of multiple threads. I've not figured out exactly what's going on but > the feel is similar to the typical case where you don't know what > thread you're typing to, and in fact different characters on the same > line end up being read by different threads. > > All of that is just background. Here's the important point: > I notice that just before my ebadf is caught I see this: > [../src/stream.d:6195] > which I gather only appears in debug builds, since I don't see it > in the output before I started building with debug. BTW, under what > circumstances are such messages printed in general? see lispbibl.d and search for "fprintf(stderr.*__FILE__" > That line of stream.d appears in the OS_filestream_error line of this: > local maygc uintL low_fill_buffered_handle (object stream, > perseverance_t persev) { > ... > GC_SAFE_SYSTEM_CALL(result=, > fd_read(handle,buff,strm_buffered_bufflen,persev)); > stream = popSTACK(); > unpin_varobject(BufferedStream_buffer(stream)); > if (result<0) /* error occurred? */ > OS_filestream_error(stream); <========= this line > if (result==0 && error_eof_p()) > BufferedStream_have_eof_p(stream) = true; > return result; > } > > I also see evidence that the error is coming from > EXT:READ-BYTE-SEQUENCE and that this function is returning NIL. > My understanding is that it is supposed to return an integer. > Note that this is not where I get the ebadf. I think that the > return of NIL is causing an error that I catch and that I later > try to close the stream, and that's what causes the ebadf. > So perhaps the OS_filestream_error closes the stream? it does not. > Where is fd_read defined and when does it return <0 ? unixaux.d |
From: <don...@is...> - 2009-12-20 09:22:55
|
Sam Steingold writes: > Don Cohen wrote: ... > > I notice that just before my ebadf is caught I see this: > > [../src/stream.d:6195] ... > > That line of stream.d appears in the OS_filestream_error line of this: > > local maygc uintL low_fill_buffered_handle (object stream, > > perseverance_t persev) { > > ... > > GC_SAFE_SYSTEM_CALL(result=, > > fd_read(handle,buff,strm_buffered_bufflen,persev)); > > stream = popSTACK(); > > unpin_varobject(BufferedStream_buffer(stream)); > > if (result<0) /* error occurred? */ > > OS_filestream_error(stream); <========= this line > > if (result==0 && error_eof_p()) > > BufferedStream_have_eof_p(stream) = true; > > return result; > > } It's interesting that the OS_Filestream_error above is passed the stream but only prints the line number. Couldn't we find out what the error was there? Since I'm doing a nightly build it should be easy enough for me to make whatever change you like to the code above (or elsewhere) to show more useful info. Then just wait for the nightly build and restart my server. Does OS_Filestream_error return from low_fill_buffered_handle? If so, what value does it return? If not then I guess low_fill_buffered_handle will return a negative value. But I don't know where it will return TO. At this point I'm assuming that the [../src/stream.d:6195] occurs inside the EXT:READ-BYTE-SEQUENCE. Does that seem reasonable to you? Does EXT:READ-BYTE-SEQUENCE call low_fill_buffered_handle ? Even if this is correct I still don't know whether the problem there is ebadf. If it is then I don't know how it gets to be ebadf and if not then I don't know how the error later on gets to be ebadf. Do you believe me that EXT:READ-BYTE-SEQUENCE then returns NIL or do you want some evidence? Do you see how that happens? I hope I mentioned this: I'm currently running one server with mt and one without on the same machine and so far only the one with mt is getting ebadf's . |
From: Sam S. <sd...@gn...> - 2009-12-20 21:02:31
|
> * Don Cohen <qba...@vf...3-vap.pbz> [2009-12-20 01:22:58 -0800]: > > > > That line of stream.d appears in the OS_filestream_error line of this: > > > local maygc uintL low_fill_buffered_handle (object stream, > > > perseverance_t persev) { > > > ... > > > GC_SAFE_SYSTEM_CALL(result=, > > > fd_read(handle,buff,strm_buffered_bufflen,persev)); > > > stream = popSTACK(); > > > unpin_varobject(BufferedStream_buffer(stream)); > > > if (result<0) /* error occurred? */ > > > OS_filestream_error(stream); <========= this line > > > if (result==0 && error_eof_p()) > > > BufferedStream_have_eof_p(stream) = true; > > > return result; > > > } > > It's interesting that the OS_Filestream_error above is passed the > stream but only prints the line number. line number is only printed for debug purposes. > Couldn't we find out what the error was there? break OS_Filestream_error also, OS_Filestream_error will print something like *** - UNIX error 9 (EBADF): Bad file number as you reported originally. > Does OS_Filestream_error return from low_fill_buffered_handle? OS_Filestream_error raises a signal (non-local exit). > Does EXT:READ-BYTE-SEQUENCE call low_fill_buffered_handle ? yes. > Do you believe me that EXT:READ-BYTE-SEQUENCE then returns NIL or > do you want some evidence? EBADF means that READ-BYTE-SEQUENCE does not return at all. -- Sam Steingold (http://sds.podval.org/) on Ubuntu 9.10 (karmic) http://thereligionofpeace.com http://mideasttruth.com http://ffii.org http://honestreporting.com http://palestinefacts.org http://dhimmi.com cogito cogito ergo cogito sum |
From: <don...@is...> - 2009-12-20 22:42:29
|
Sam Steingold writes: > > > > It's interesting that the OS_Filestream_error above is passed the > > stream but only prints the line number. > line number is only printed for debug purposes. > > Couldn't we find out what the error was there? > break OS_Filestream_error > also, OS_Filestream_error will print something like > *** - UNIX error 9 (EBADF): Bad file number > as you reported originally. That's not what's happening to me. I get [../src/stream.d:6195] and no further message until other stuff happens in lisp. > > Does OS_Filestream_error return from low_fill_buffered_handle? > OS_Filestream_error raises a signal (non-local exit). > > Does EXT:READ-BYTE-SEQUENCE call low_fill_buffered_handle ? > yes. > > > Do you believe me that EXT:READ-BYTE-SEQUENCE then returns NIL or > > do you want some evidence? > EBADF means that READ-BYTE-SEQUENCE does not return at all. Ok, I think this helps. Here's what I see in an older transcript: [../src/stream.d:6195] 2009/12/10 23:57:38 ignore error from wait UNIX error 104 (ECONNRESET): Connection reset by peer Would you believe that this error causes the next attempt to use the stream to return ebadf ? Only in mt ? The code generating the output above is doing something like (ignore-errors (ext:socket-status ...)) After that I get another error trying to do read-byte-sequence [../src/stream.d:6195] 2009/12/10 23:57:38 ebadf! However, the stream at that point does not print as a closed stream. |
From: Sam S. <sd...@gn...> - 2009-12-21 01:58:58
|
> * Don Cohen <qba...@vf...3-vap.pbz> [2009-12-20 14:42:26 -0800]: > > Sam Steingold writes: > > > > > > It's interesting that the OS_Filestream_error above is passed the > > > stream but only prints the line number. > > line number is only printed for debug purposes. > > > Couldn't we find out what the error was there? > > break OS_Filestream_error did you try this? > > also, OS_Filestream_error will print something like > > *** - UNIX error 9 (EBADF): Bad file number > > as you reported originally. > > That's not what's happening to me. > I get > [../src/stream.d:6195] > and no further message until other stuff happens in lisp. I guess you are ignoring the error. > > > Does OS_Filestream_error return from low_fill_buffered_handle? > > OS_Filestream_error raises a signal (non-local exit). > > > Does EXT:READ-BYTE-SEQUENCE call low_fill_buffered_handle ? > > yes. > > > > > Do you believe me that EXT:READ-BYTE-SEQUENCE then returns NIL or > > > do you want some evidence? > > EBADF means that READ-BYTE-SEQUENCE does not return at all. > > Ok, I think this helps. > Here's what I see in an older transcript: > > [../src/stream.d:6195] > 2009/12/10 23:57:38 > ignore error from wait > UNIX error 104 (ECONNRESET): Connection reset by peer > > Would you believe that this error causes the next attempt to use the > stream to return ebadf ? Only in mt ? this is a fine question for a unix guru. I am not one. > [../src/stream.d:6195] > 2009/12/10 23:57:38 ebadf! > > However, the stream at that point does not print as a closed stream. well, as I said long ago, you should ask a unix guru why would you ever get a ebadf. -- Sam Steingold (http://sds.podval.org/) on Ubuntu 9.10 (karmic) http://mideasttruth.com http://camera.org http://pmw.org.il http://www.memritv.org http://honestreporting.com (let((a'(list'let(list(list'a(list'quote a)))a)))`(let((a(quote ,a))),a)) |
From: <don...@is...> - 2009-12-21 02:17:30
|
Sam Steingold writes: > > > break OS_Filestream_error > did you try this? No, I think the question was answered below. > > > also, OS_Filestream_error will print something like > > > *** - UNIX error 9 (EBADF): Bad file number > > > as you reported originally. I think what we see below is that OS_Filestream_error does not actually print this, but generates a lisp error with this as its text. Which I was catching later on when looking for ebadf's. > > Ok, I think this helps. > > Here's what I see in an older transcript: > > > > [../src/stream.d:6195] > > 2009/12/10 23:57:38 > > ignore error from wait > > UNIX error 104 (ECONNRESET): Connection reset by peer > > > > Would you believe that this error causes the next attempt to use the > > stream to return ebadf ? Only in mt ? > > this is a fine question for a unix guru. > I am not one. But before asking a unix guru could we find and understand the code that is generating this error and see how it differs between mt and non-mt? Given that the error seems to happen in a call to socket-status I'd be interested in how a call to socket-status reaches line 6195 and what the difference is between mt and non-mt in that case. |
From: Sam S. <sd...@gn...> - 2009-12-21 04:29:29
|
> * Don Cohen <qba...@vf...3-vap.pbz> [2009-12-20 18:17:27 -0800]: > > > this is a fine question for a unix guru. > > I am not one. > > But before asking a unix guru could we find and understand the code > that is generating this error and see how it differs between mt and > non-mt? Given that the error seems to happen in a call to > socket-status I'd be interested in how a call to socket-status reaches > line 6195 and what the difference is between mt and non-mt in that > case. are you saying that the both builds are running the same code, i.e., the mt build does not create threads? this is interesting. maybe comparing the output of strace would help. socket-status calls listen which tries to read from the fd. -- Sam Steingold (http://sds.podval.org/) on Ubuntu 9.10 (karmic) http://ffii.org http://www.memritv.org http://camera.org http://truepeace.org http://palestinefacts.org http://memri.org http://mideasttruth.com Our business is run on trust. We trust you will pay in advance. |
From: <don...@is...> - 2009-12-22 07:48:44
|
I think it's time to try a different approach to this problem. I was considering a binary search in cvs checkout times, but first I thought I'd look at cvs diff output from before the first ebadf report (12/4). This patch from Wed Nov 25 20:46:44 2009 UTC seems like a candidate to me: RCS file: /cvsroot/clisp/clisp/src/stream.d,v retrieving revision 1.668 retrieving revision 1.669 diff -r1.668 -r1.669 5301a5302,5303 > UnbufferedStream_status(stream) = 0; /* forget about past EOF & bytebuf */ > TheStream(stream)->strm_rd_ch_last = NIL; /* forget last char */ 5304d5305 < TheStream(stream)->strm_rd_ch_last = NIL; /* forget about past EOF */ 6747a6749,6758 > /* UP: discard already entered input from a Buffered Stream. > clear_input_buffered(stream); > > stream: Buffered Stream > < result: true if Input was deleted, else false */ > local maygc bool clear_input_buffered (object stream) { > bool ret = BufferedStream_have_eof_p(stream); > BufferedStream_have_eof_p(stream) = false; > return ret; > } > 16474c16485 < result = false; --- > result = clear_input_buffered(stream); Do you think it's plausible that it could cause UNIX error 9 (EBADF): Bad file number appearing after UNIX error 104 (ECONNRESET): Connection reset by peer ? |
From: Sam S. <sd...@gn...> - 2009-12-22 14:39:33
|
Don Cohen wrote: > I think it's time to try a different approach to this problem. > I was considering a binary search in cvs checkout times, but first > I thought I'd look at cvs diff output from before the first ebadf > report (12/4). This patch from Wed Nov 25 20:46:44 2009 UTC seems > like a candidate to me: > > RCS file: /cvsroot/clisp/clisp/src/stream.d,v > retrieving revision 1.668 > retrieving revision 1.669 > diff -r1.668 -r1.669 > 5301a5302,5303 > > UnbufferedStream_status(stream) = 0; /* forget about past EOF & bytebuf */ > > TheStream(stream)->strm_rd_ch_last = NIL; /* forget last char */ > 5304d5305 > < TheStream(stream)->strm_rd_ch_last = NIL; /* forget about past EOF */ > 6747a6749,6758 > > /* UP: discard already entered input from a Buffered Stream. > > clear_input_buffered(stream); > > > stream: Buffered Stream > > < result: true if Input was deleted, else false */ > > local maygc bool clear_input_buffered (object stream) { > > bool ret = BufferedStream_have_eof_p(stream); > > BufferedStream_have_eof_p(stream) = false; > > return ret; > > } > > > 16474c16485 > < result = false; > --- > > result = clear_input_buffered(stream); > > Do you think it's plausible that it could cause > UNIX error 9 (EBADF): Bad file number > appearing after > UNIX error 104 (ECONNRESET): Connection reset by peer no. the change above only changes clisp bookkeeping, not fd's relationships with the OS. |