Menu

#720 Flaky test on MacOS: socket-server with run-program

lisp error
closed-fixed
None
5
2019-06-28
2017-11-30
No

All but one test from socket.tst pass. Alas, failure causes configure -cbx to skip remaining tests from the Makefile:
# Perform self-tests. check : check-recompile check-fresh-line check-script check-tests check-sacla-tests bench

The flaky tests involves socket-server and run-program. It randomly produces one of 4 outcomes:
CORRECT: (:OUTPUT "foo" :OUTPUT T NIL T NIL END-OF-FILE)
1. OK, test is successfull, or
2. CLISP : ERROR OS-STREAM-ERROR(54): Connection reset by peer OUT: "[OS-STREAM-ERROR]: OS-STREAM-ERROR(54): Connection reset by peer "
3. CLISP : ERROR OS-ERROR(32): Broken pipe OUT: "[OS-ERROR]: OS-ERROR(32): Broken pipe "
4. `CLISP : ERROR
OS-ERROR(32): Broken pipe
OUT:
"READ-CHAR
[SIMPLE-END-OF-FILE]: READ: input stream #1=#<IO INPUT-BUFFERED="" SOCKET-STREAM="" CHARACTER="" 0.0.0.0:49996=""> has reached its end

[OS-ERROR]: OS-ERROR(32): Broken pipe
"`

If you look at the source, you see that there have been a few attempts at MacOSX specific fixes already. Not enough, or not the right ones, apparently.

  (let ((se (socket:socket-server)))
    (ext:run-program run :arguments (append args (list "-q" "-q" "-x" (format nil "(close (socket:socket-connect ~D))" (socket:socket-serve\
r-port se))))
                     :wait nil :input nil :output nil)
    (unwind-protect
         (with-open-stream (so (socket:socket-accept se))
           (list
            (socket:socket-status so)
            (write-line "foo" so)
            (socket:socket-status so)
            #+macos (handler-case (read-char so)
                      (end-of-file (c)
                        (princ 'read-char) (princ-error c) t))
            #-macos (check-os-error (read-char so) (:ECONNRESET 104))
        (null (member (socket:socket-status so) '(:EOF :APPEND)))
            #+macos (string= (write-line "bar" so) "bar")
            #-macos (check-os-error (write-line "bar" so) (:EPIPE 32))
            (null (member (socket:socket-status so) '(:EOF :APPEND)))
            (handler-case (read-char so)
              (end-of-file (c)
                (princ 'read-char) (princ-error c) 'end-of-file))))
      (socket:socket-server-close se))))

I believe this is the test that causes one of two requests "Allow incoming connections?" to popup during execution. I don't understand how that works in OSX, because the program is not halted while the request is up. Perhaps OSX saves the answer for the next execution? Sometimes these popups disappear after a fraction of a second, sometimes they stay for much longer.

Discussion

  • Jörg Höhle

    Jörg Höhle - 2017-11-30

    For instance, with two interactive shells, (close(socket-accept #)) causes the first shell to see :APPEND from a following socket-status (then EPIPE 32 from write-line). IOW, among the expected result list, even the first :OUTPUT is a bogus expectation. There's too much concurrency.
    I don't know wheter (close(prog1 (socket-connect d)(sleep 1)) would help, that's still heuristic.

     
  • Jörg Höhle

    Jörg Höhle - 2017-12-07

    The best I've come up with so far is

    (multiple-value-bind (run args) (cmd-args)
      (let ((se (socket:socket-server)))
        ;; Sleep between socket-connect and close prevents (:APPEND "foo" :APPEND T ...) most of the time (on MacOS)
        (ext:run-program run :arguments (append args (list "-q" "-q" "-x" (format nil "(close (progn (socket:socket-connect ~D) (sleep .01s0)))" (socket:socket-server-port se))))
                         :wait nil :input nil :output nil)
        (unwind-protect
             (with-open-stream (so (socket:socket-accept se))
               (list
                (check-os-error (socket:socket-status so) ())
                (write-line "foo" so)
                ;; Wrap to catch transient "Connection reset by peer" (on MacOS)
                (check-os-error (socket:socket-status so) ())
                #+macos (handler-case (read-char so)
                          (end-of-file (c)
                            (princ 'read-char) (princ-error c) t))
                #-macos (check-os-error (read-char so) (:ECONNRESET 104))
                (null (member (socket:socket-status so) '(#-macos :EOF :APPEND)))
                ;; Sleep before write #2 "bar" most often causes it to succeed, rather than OS-ERROR(32): Broken pipe
                #+macos (check-os-error (progn ;; (sleep .1s0) ;; map both paths to T
                                               (string= (write-line "bar" so) "bar")) (:EPIPE 32))
                #-macos (check-os-error (write-line "bar" so) (:EPIPE 32))
                (null (member (socket:socket-status so) '(#-macos :EOF :APPEND)))
                (handler-case (read-char so)
                  (end-of-file (c)
                    (princ 'read-char) (princ-error c) 'end-of-file))))
          (socket:socket-server-close se))))
    (:OUTPUT "foo" :OUTPUT T NIL T NIL END-OF-FILE)
    

    With it, twenties of iterations are usually possible (even hundreds with a compiled testcase). Originally, approx. every second one would fail. Occasional remaining failures are:

    ;; (:APPEND 32 :APPEND T :APPEND T :APPEND END-OF-FILE) sometimes                 
    ;; (:APPEND "foo" :APPEND T :APPEND T :APPEND END-OF-FILE)
    ;; (:OUTPUT "foo" :APPEND T :APPEND T :APPEND END-OF-FILE) very rare
    ;; (:OUTPUT "foo" 54 T :APPEND T :APPEND END-OF-FILE) Connection reset by peer
    

    Sleep between socket-connect and close prevents (:APPEND "foo" :APPEND T ...) most of the time.
    Sleep before write #2 "bar" most often causes it to succeed, rather than OS-ERROR(32): Broken pipe
    However, the busier the machine, the more likely write #2 "bar" leads to OS-ERROR(32).

     
  • Sam Steingold

    Sam Steingold - 2018-01-23

    I think we need to fix the core, not the test.

     
  • Sam Steingold

    Sam Steingold - 2018-01-23

    also, in your patch, you want to replace the first progn with prog1

     
  • Sam Steingold

    Sam Steingold - 2019-06-28
    • status: open --> closed-fixed
    • assigned_to: Sam Steingold
     
  • Sam Steingold

    Sam Steingold - 2019-06-28

    fixed in git

     

Log in to post a comment.