CLISP - an ANSI Common Lisp / Bugs / #720 Flaky test on MacOS: socket-server with run-program

#720 Flaky test on MacOS: socket-server with run-program

Milestone: lisp error

Status: closed-fixed

Owner: Sam Steingold

Labels: None

Priority: 5

Updated: 2019-06-28

Created: 2017-11-30

Creator: Jörg Höhle

Private: No

All but one test from socket.tst pass. Alas, failure causes configure -cbx to skip remaining tests from the Makefile:
# Perform self-tests. check : check-recompile check-fresh-line check-script check-tests check-sacla-tests bench

The flaky tests involves socket-server and run-program. It randomly produces one of 4 outcomes:
CORRECT: (:OUTPUT "foo" :OUTPUT T NIL T NIL END-OF-FILE)
1. OK, test is successfull, or
2. CLISP : ERROR OS-STREAM-ERROR(54): Connection reset by peer OUT: "[OS-STREAM-ERROR]: OS-STREAM-ERROR(54): Connection reset by peer "
3. CLISP : ERROR OS-ERROR(32): Broken pipe OUT: "[OS-ERROR]: OS-ERROR(32): Broken pipe "
4. `CLISP : ERROR
OS-ERROR(32): Broken pipe
OUT:
"READ-CHAR
[SIMPLE-END-OF-FILE]: READ: input stream #1=#<IO INPUT-BUFFERED="" SOCKET-STREAM="" CHARACTER="" 0.0.0.0:49996=""> has reached its end

[OS-ERROR]: OS-ERROR(32): Broken pipe
"`

If you look at the source, you see that there have been a few attempts at MacOSX specific fixes already. Not enough, or not the right ones, apparently.

  (let ((se (socket:socket-server)))
    (ext:run-program run :arguments (append args (list "-q" "-q" "-x" (format nil "(close (socket:socket-connect ~D))" (socket:socket-serve\
r-port se))))
                     :wait nil :input nil :output nil)
    (unwind-protect
         (with-open-stream (so (socket:socket-accept se))
           (list
            (socket:socket-status so)
            (write-line "foo" so)
            (socket:socket-status so)
            #+macos (handler-case (read-char so)
                      (end-of-file (c)
                        (princ 'read-char) (princ-error c) t))
            #-macos (check-os-error (read-char so) (:ECONNRESET 104))
        (null (member (socket:socket-status so) '(:EOF :APPEND)))
            #+macos (string= (write-line "bar" so) "bar")
            #-macos (check-os-error (write-line "bar" so) (:EPIPE 32))
            (null (member (socket:socket-status so) '(:EOF :APPEND)))
            (handler-case (read-char so)
              (end-of-file (c)
                (princ 'read-char) (princ-error c) 'end-of-file))))
      (socket:socket-server-close se))))

I believe this is the test that causes one of two requests "Allow incoming connections?" to popup during execution. I don't understand how that works in OSX, because the program is not halted while the request is up. Perhaps OSX saves the answer for the next execution? Sometimes these popups disappear after a fraction of a second, sometimes they stay for much longer.

Discussion

Jörg Höhle - 2017-11-30

For instance, with two interactive shells, (close(socket-accept #)) causes the first shell to see :APPEND from a following socket-status (then EPIPE 32 from write-line). IOW, among the expected result list, even the first :OUTPUT is a bogus expectation. There's too much concurrency.
I don't know wheter (close(prog1 (socket-connect d)(sleep 1)) would help, that's still heuristic.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

The best I've come up with so far is

(multiple-value-bind (run args) (cmd-args)
  (let ((se (socket:socket-server)))
    ;; Sleep between socket-connect and close prevents (:APPEND "foo" :APPEND T ...) most of the time (on MacOS)
    (ext:run-program run :arguments (append args (list "-q" "-q" "-x" (format nil "(close (progn (socket:socket-connect ~D) (sleep .01s0)))" (socket:socket-server-port se))))
                     :wait nil :input nil :output nil)
    (unwind-protect
         (with-open-stream (so (socket:socket-accept se))
           (list
            (check-os-error (socket:socket-status so) ())
            (write-line "foo" so)
            ;; Wrap to catch transient "Connection reset by peer" (on MacOS)
            (check-os-error (socket:socket-status so) ())
            #+macos (handler-case (read-char so)
                      (end-of-file (c)
                        (princ 'read-char) (princ-error c) t))
            #-macos (check-os-error (read-char so) (:ECONNRESET 104))
            (null (member (socket:socket-status so) '(#-macos :EOF :APPEND)))
            ;; Sleep before write #2 "bar" most often causes it to succeed, rather than OS-ERROR(32): Broken pipe
            #+macos (check-os-error (progn ;; (sleep .1s0) ;; map both paths to T
                                           (string= (write-line "bar" so) "bar")) (:EPIPE 32))
            #-macos (check-os-error (write-line "bar" so) (:EPIPE 32))
            (null (member (socket:socket-status so) '(#-macos :EOF :APPEND)))
            (handler-case (read-char so)
              (end-of-file (c)
                (princ 'read-char) (princ-error c) 'end-of-file))))
      (socket:socket-server-close se))))
(:OUTPUT "foo" :OUTPUT T NIL T NIL END-OF-FILE)

With it, twenties of iterations are usually possible (even hundreds with a compiled testcase). Originally, approx. every second one would fail. Occasional remaining failures are:

;; (:APPEND 32 :APPEND T :APPEND T :APPEND END-OF-FILE) sometimes                 
;; (:APPEND "foo" :APPEND T :APPEND T :APPEND END-OF-FILE)
;; (:OUTPUT "foo" :APPEND T :APPEND T :APPEND END-OF-FILE) very rare
;; (:OUTPUT "foo" 54 T :APPEND T :APPEND END-OF-FILE) Connection reset by peer

Sleep between socket-connect and close prevents (:APPEND "foo" :APPEND T ...) most of the time.
Sleep before write #2 "bar" most often causes it to succeed, rather than OS-ERROR(32): Broken pipe
However, the busier the machine, the more likely write #2 "bar" leads to OS-ERROR(32).

bug-socket-720.patch

Sam Steingold - 2018-01-23

I think we need to fix the core, not the test.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Steingold - 2018-01-23

also, in your patch, you want to replace the first progn with prog1

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Steingold - 2019-06-28

status: open --> closed-fixed

assigned_to: Sam Steingold
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Steingold - 2019-06-28

fixed in git

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Flaky test on MacOS: socket-server with run-program

Group

Searches

Help

#720 Flaky test on MacOS: socket-server with run-program

Discussion