#378 thru TCP without pseudo-tty => UNIX error 14 bad address

open
Jörg Höhle
clisp (525)
5
2014-08-22
2006-11-08
No

When clisp is invoked thru bare ssh, it fails. We
need to use the -t
option to ssh to force pseudo-tty allocation for clisp.
I see no good
reason why clisp couldn't detect that it isn't
connected to a pty and
avoid to use an unapplyable API, after all, it does it
right with
files and pipes:

# (not localhost)
$ ssh janus-1 clisp --version
stty: standard input: Invalid argument
GNU CLISP 2.39 (2006-07-16) (built 3365477833) (memory
3365480464)
Software: GNU C 3.3 20030226 (prerelease) (SuSE Linux)
gcc -g -O2 -W -Wswitch -Wcomment -Wpointer-arith
-Wimplicit -Wreturn-type -Wmissing-declarations
-Wno-sign-compare -O2 -fexpensive-optimizations
-falign-functions=4 -DUNICODE -DDYNAMIC_FFI -I. -x none
libcharset.a libavcall.a libcallback.a -lreadline
-lncurses -ldl -lsigsegv -L/usr/X11R6/lib
SAFETY=0 HEAPCODES LINUX_NOEXEC_HEAPCODES
GENERATIONAL_GC SPVW_BLOCKS SPVW_MIXED TRIVIALMAP_MEMORY
libsigsegv 2.2
libreadline 4.3
Features:
(READLINE REGEXP SYSCALLS I18N LOOP COMPILER CLOS MOP
CLISP ANSI-CL COMMON-LISP LISP=CL INTERPRETER
SOCKETS GENERIC-STREAMS LOGICAL-PATHNAMES SCREEN FFI
GETTEXT UNICODE BASE-CHAR=CHARACTER PC386
UNIX)
C Modules: (clisp i18n syscalls regexp readline)
Installation directory:
/usr/local/languages/clisp-2.39-pjb1-regexp/lib/clisp/
User language: ENGLISH
Machine: I686 (I686) janus-1.janus.afaa.asso.fr
[195.114.85.145]
[1]>
*** - UNIX error 14 (EFAULT): Bad address

(ext:quit)

[pjb@thalassa httpd]$ ssh -t janus-1 clisp --version
GNU CLISP 2.30 (released 2002-09-15) (built on
bragg.suse.de [127.0.0.2])
Features:
(CLOS LOOP COMPILER CLISP ANSI-CL COMMON-LISP LISP=CL
INTERPRETER SOCKETS
GENERIC-STREAMS LOGICAL-PATHNAMES SCREEN FFI UNICODE
BASE-CHAR=CHARACTER
SYSCALLS PC386 UNIX)
Connection to janus-1 closed.

[pjb@thalassa httpd]$ ls -l | clisp --version 2>
/tmp/errs | cat
GNU CLISP 2.39 (2006-07-16) (built 3364813332) (memory
3364813914)
Software: GNU C 3.3 20030226 (prerelease) (SuSE Linux)
gcc -g -O2 -W -Wswitch -Wcomment -Wpointer-arith
-Wimplicit -Wreturn-type -Wmissing-declarations
-Wno-sign-compare -O2 -fexpensive-optimizations
-falign-functions=4 -DUNICODE -DDYNAMIC_FFI -I. -x none
libcharset.a libavcall.a libcallback.a -lreadline
-lncurses -ldl -lsigsegv -L/usr/X11R6/lib
SAFETY=0 HEAPCODES LINUX_NOEXEC_HEAPCODES
GENERATIONAL_GC SPVW_BLOCKS SPVW_MIXED TRIVIALMAP_MEMORY
libsigsegv 2.4
libreadline 4.3
Features:
(READLINE REGEXP SYSCALLS I18N LOOP COMPILER CLOS MOP
CLISP ANSI-CL COMMON-LISP
LISP=CL INTERPRETER SOCKETS GENERIC-STREAMS
LOGICAL-PATHNAMES SCREEN FFI
GETTEXT UNICODE BASE-CHAR=CHARACTER PC386 UNIX)
C Modules: (clisp i18n syscalls regexp readline)
Installation directory:
/usr/local/languages/clisp-2.39-pjb1-regexp/lib/clisp/
User language: ENGLISH
Machine: I686 (I686) thalassa.informatimago.com
[62.93.174.79]

Discussion

  • Jörg Höhle
    Jörg Höhle
    2006-11-08

    Logged In: YES
    user_id=377168

    Confirmed.
    Two things are remarkable about the gdb backtrace
    Third, stream.d:finish_tty_output() looks strange. It does
    both fsync() and ioctl().
    I'd have expected either one or the other.
    Maybe the #ifdef are broken? Or return; on success is
    missing, before trying the next method in sequence?

    1. a link to [ Bug #1220548 ] -- See how reset(count=<broken
    value>) is found on the stack.
    #11 0x08065b70 in interpret_bytecode_ (closure=0x20385cfe,
    codeptr=0x202aa5dc, byteptr_in=0x42 <Address 0x42 out of
    bounds>)
    at eval.d:7077
    #12 0x080667e8 in funcall_closure (closure=0xb7b24084,
    args_on_stack=<value optimized out>) at eval.d:5771
    #13 0x080dab73 in driver () at debug.d:477
    #14 0x0805b3ec in reset (count=3081912564) at eval.d:517
    #15 0x080665b7 in interpret_bytecode_ (closure=0x20387996,
    codeptr=0x20343fc4, byteptr_in=0x20344004
    "\022\002\031\005")
    at eval.d:7608

    2. the problem itself:
    #25 0x0805d111 in funcall_subr (fun=0x81a0e06,
    args_on_stack=0) at eval.d:5325
    #26 0x080dd8e6 in signal_and_debug (condition=0xbfbb3e10) at
    error.d:204
    #27 0x080ddb4b in end_error (stackptr=<value optimized out>,
    start_driver_p=true) at error.d:317
    #28 0x080dfa7b in OS_error () at errunix.d:688
    #29 0x080851f7 in low_finish_output_unbuffered_handle
    (stream=0xbfbb3e10)
    at stream.d:3493
    #30 0x08085c7d in finish_output_unbuffered (stream=0x2038709e)
    at stream.d:5678
    #31 0x08096954 in fresh_line (stream_=0xb7b24008) at
    stream.d:16607
    #32 0x080a1b0d in C_fresh_line () at io.d:10610
    #33 0x0805d18c in funcall_subr (fun=0x81a19e6,
    args_on_stack=1) at eval.d:5330
    #34 0x08052884 in quit () at spvw.d:3423
    #35 0x08068c61 in C_exit () at control.d:15

    stream.d:3493 contains
    local void finish_tty_output (Handle handle)
    ...
    #if defined(UNIX_TERM_TERMIOS) && defined(TCGETS) &&
    defined(TCSETSW)
    {
    var struct termios term_parameters;
    if (!( ( ioctl(handle,TCGETS,&term_parameters) ==0)
    && ( ioctl(handle,TCSETSW,&term_parameters)
    ==0))) {
    if (!((errno==ENOTTY)||(errno==EINVAL)))
    { OS_error(); } # no TTY: OK, report other Error
    }
    }
    Actually, tcdrain() is unsupported (gdb line number must be
    out of sync) and yields (-1,14).

    4. It's curious that this only appears with --version.
    (finish-output *terminal-io*) also raises the error within ssh.
    Obviously, finish-output is almost never called.

    5. Maybe a bug should be reported to ssh, that tcdrain()
    yields (-1,14)
    Could you please investigate other situations (inside socket
    etc.) using this code?
    (use-package "FFI")
    (def-call-out tcdrain (:arguments (fd int)) (:return-type
    int) (:library :default))
    (locally (declare (compile)) (setf errno 0) (values (tcdrain
    1) errno))

     
  • Jörg Höhle
    Jörg Höhle
    2006-11-08

    Logged In: YES
    user_id=377168

    thank you for your bug report.
    the bug has been fixed in the CVS tree.
    you can either wait for the next release (recommended)
    or check out the current CVS tree (see http://clisp.cons.org\)
    and build CLISP from the sources (be advised that between
    releases the CVS tree is very unstable and may not even build
    on your platform).

     
  • Jörg Höhle
    Jörg Höhle
    2006-11-08

    • assigned_to: haible --> hoehle
    • status: open --> closed-fixed
     
  • Sam Steingold
    Sam Steingold
    2006-11-08

    • status: closed-fixed --> open-fixed
     
  • Sam Steingold
    Sam Steingold
    2006-11-08

    Logged In: YES
    user_id=5735

    there are two separate issues here:
    1. under SSH, make_terminal_io() should call
    make_twoway_stream() instead of make_terminal_stream().
    this requires replacing regular_handle_p() calls with
    isatty() there.
    the risk is that in some obscure cases (cygwin xterm &c)
    this would lose us readline).
    2. FINISH-OUTPUT does not list exceptional situations due to
    OS interaction, so it makes sense to ignore all errors in
    finish_tty_output() et al.
    the risk is that we may miss some bugs this way.

     
  • Sam Steingold
    Sam Steingold
    2006-11-08

    • assigned_to: hoehle --> haible
    • status: open-fixed --> open
     
  • Sam Steingold
    Sam Steingold
    2006-11-08

    • assigned_to: haible --> hoehle
    • status: open --> closed-fixed
     
  • Sam Steingold
    Sam Steingold
    2006-11-08

    Logged In: YES
    user_id=5735

    sorry Jorg, my comment crossed your.
    also, please avoid '# ' comments in new code.

     
  • Jörg Höhle
    Jörg Höhle
    2006-11-09

    • status: closed-fixed --> open-fixed
     
  • Jörg Höhle
    Jörg Höhle
    2006-11-09

    Logged In: YES
    user_id=377168

    1. Indeed, I was wondering why finish_tty_output would be
    concerned when obviously there's no tty.
    Actually, this may be the reason:
    local void low_finish_output_unbuffered_handle (object stream) {

    finish_tty_output(TheHandle(TheStream(stream)->strm_ochannel));

    2. I've learnt not to deduce anything from missing
    exceptional situations in the CLHS. E.g. WRITE does not
    mention exceptional I/O situations. Is that an argument to
    ignore them?

    My first patch was broken, I've a fix pending -- but that
    does not solve the ssh issue.

    Another solution path is to explore why only --version is
    affected, i.e. why that invokes (finish-output
    *terminal-io*) while a normal interactive session does not.

    Note that even after applying my patch, within ssh:
    [1]> (finish-output *terminal-io*)
    *** - UNIX error 14 (EFAULT): Bad address
    The following restarts are available:
    ABORT :R1 ABORT
    Break 1 [2]> (finish-output *terminal-io*)

    *** - UNIX error 14 (EFAULT): Bad address

    *** - UNIX error 14 (EFAULT): Bad address

    *** - UNIX error 14 (EFAULT): Bad address

    *** - UNIX error 14 (EFAULT): Bad address

    Weird, isn't it?

     
  • Jörg Höhle
    Jörg Höhle
    2006-11-09

    Logged In: YES
    user_id=377168

    Some facts:
    ;ssh: fsync: (-1,22) EINVAL no error
    ;ssh: tcdrain: (-1,14) EFAULT error
    ;pipe: tcdrain: (-1,22) EINVAL error ; raises no error in CLISP
    ;socket: tcdrain (-1,14) EFAULT

    However, (finish-output #<socket stream>) does not raise an
    error, because that goes to
    local void low_finish_output_unbuffered_pipe (object stream)
    {} # do nothing
    instead of finish_tty_output.

    I'd say there's no bug in ssh, it just returns what a socket
    yields.

    Open issues:
    1. is it normal that finish_tty_output gets called in that
    situation?
    2a. if it is, consider adding EFAULT to the list of
    acceptable returns
    2b. if it is, should it try all three methods in turn, or
    should it rather implement
    #if HAVE_FSYNC # try only that
    #elif HAVE_TERMIOS # or only that
    #elif HAVE_IOCTL

     
  • Sam Steingold
    Sam Steingold
    2006-11-09

    Logged In: YES
    user_id=5735

    if you follow my suggestion in "1", are the i/o streams
    created unbuffered?
    if yes, this is a valid venue. you will need to send the
    patch to out cygwin maintainer Reini Urban and ask him to
    check if this change makes readline unavailable on cygin
    console, xterm, rxvt &c.
    actually, it would be nice to check that for all "exotic"
    consoles, but we have access to only these: linux xterm,
    console, rsvt; woe32 console; cygwin console, xterm, rxvt
    and cygwin is the only one affected.
    (there is also an issue of clisp under gdb on rxvt &c :-)

     
  • Sam Steingold
    Sam Steingold
    2006-11-09

    • status: open-fixed --> open
     
  • Sam Steingold
    Sam Steingold
    2006-11-09

    Logged In: YES
    user_id=5735

    another option is to add a note to FAQ that clisp under ssh
    requires "-t" and forget the whole issue.

     
  • Sam Steingold
    Sam Steingold
    2006-11-09

    Logged In: YES
    user_id=5735

    ... especially since this way readline is indeed available
    under ssh which is a good thing.
    but this does not solve the "clisp --version" issue because
    it is kind of hard to sell that --version requires a tty.
    here the "interactive" vs "non-interactive" distinction
    should help.
    alas, init_streamvars batch_p argument does not prevent the
    creation of a terminal tty stream because a batch (shell
    script) job may still want to interact with the user.
    maybe we should make batch_p take 3 values: 0: normal, 1:
    shell script (like now); 2: non-interactive (no tty), for
    --version.

     
  • Sam Steingold
    Sam Steingold
    2006-11-09

    Logged In: YES
    user_id=5735

    I started to think along the lines of this patch:
    --- stream.d 09 Nov 2006 06:01:30 -0500 1.570
    +++ stream.d 09 Nov 2006 10:13:17 -0500
    @@ -14911,22 +14911,32 @@
    return O(standard_error_file_stream);
    }

    -# UP: Returns the default value for *terminal-io*.
    -# can trigger GC
    -local maygc object make_terminal_io (void) {
    - # If stdin or stdout is a file, use a buffered stream
    instead of an
    - # unbuffered terminal stream. For the ud2cd program used
    as filter,
    - # this reduces the runtime on Solaris from 165 sec to 47 sec.
    - var bool stdin_file = regular_handle_p(stdin_handle);
    - var bool stdout_file = regular_handle_p(stdout_handle);
    +/* UP: Returns the default value for *terminal-io*.
    + > batch_p : is this an interactive session?
    + can trigger GC */
    +local maygc object make_terminal_io (bool batch_p) {
    + /* If stdin or stdout is a file, use a buffered stream
    instead of an
    + unbuffered terminal stream. For the ud2cd program used
    as filter,
    + this reduces the runtime on Solaris from 165 sec to 47
    sec. */
    + var bool stdin_file;
    + var bool stdout_file;
    + if (batch_p) {
    + begin_system_call();
    + stdin_file = !isatty(stdin_handle);
    + stdout_file = !isatty(stdout_handle);
    + end_system_call();
    + } else {
    + stdin_file = regular_handle_p(stdin_handle);
    + stdout_file = regular_handle_p(stdout_handle);
    + }
    if (stdin_file || stdout_file) {
    /* Input side: */
    - var object istream =
    - (stdin_file ? get_standard_input_file_stream() :
    make_terminal_stream());
    + var object istream = (stdin_file ?
    get_standard_input_file_stream()
    + : make_terminal_stream());
    pushSTACK(istream);
    /* Output side: */
    - var object ostream =
    - (stdout_file ? get_standard_output_file_stream() :
    make_terminal_stream());
    + var object ostream = (stdout_file ?
    get_standard_output_file_stream()
    + : make_terminal_stream());
    /* Build a two-way-stream: */
    return make_twoway_stream(popSTACK(),ostream);
    }
    @@ -15034,7 +15044,7 @@
    end_call();
    #endif
    {
    - var object stream = make_terminal_io();
    + var object stream = make_terminal_io(batch_p);
    define_variable(S(terminal_io),stream); # *TERMINAL-IO*
    }
    {

    but it does not help - I get a stream of

    [stream.d:3479]
    [stream.d:3479]
    *** - UNIX error 14 (EFAULT): Bad address

    [stream.d:3479]
    [stream.d:3479]
    *** - UNIX error 14 (EFAULT): Bad address

    [stream.d:3479]
    [stream.d:3479]
    *** - UNIX error 14 (EFAULT): Bad address

    [stream.d:3479]
    [stream.d:3479]
    *** - UNIX error 14 (EFAULT): Bad address

    [stream.d:3479]
    [stream.d:3479]
    *** - UNIX error 14 (EFAULT): Bad address

    [stream.d:3479]
    [stream.d:3479]
    *** - UNIX error 14 (EFAULT): Bad address

    instead of a single error.
    looks like you are right -- we need to fix tty flushing.

     
  • Sam Steingold
    Sam Steingold
    2006-11-09

    Logged In: YES
    user_id=5735

    Bruno writes:
    > it makes sense to ignore all errors in finish_tty_output()
    et al.
    > the risk is that we may miss some bugs this way.

    The risk is high: EFAULT in particular means that you/we/the
    tcdrain
    function has passed to the kernel an invalid memory address.
    EFAULT
    is a friendly warning. On other kernels / in other
    situations, it could
    overwrite arbitrary regions of memory and make the program
    crash later on.
    => NEVER ignore EFAULT, unless you have investigated it in
    depth and
    are 200% convinced that it's a kernel bug.

     
  • Jörg Höhle
    Jörg Höhle
    2006-11-10

    Logged In: YES
    user_id=377168

    I don't like Sam's suggestion of adding even more cases two
    the terminal IO detection. It makes for less reliable SW.
    Why? Because programmers try out their software in few
    situations using "clisp" or "ssh localhost clisp" or "clisp
    | cat". The more distinctions there are, the more likely
    somebody else than the programmer will run into a case where
    something fails.
    The more understandable the distinctions, the better. We
    should be able to explain the behaviour to the user, and
    s/he should understand it.

    I observed the following surprising situation: clisp | tee foo
    *error-output* == *standard-output* = #<terminal-io>
    (error output goes to the pipe)
    which may be fine in itself, but is not what a UNIX guy expects.

    About stream.d:fresh_line
    I'd like to query whether it's needed for FRESH-LINE to call
    FINISH-OUTPUT instead of FORCE-OUTPUT. The reasoning is that
    "fresh-line is similar to TERPRI" (21.2). It's output only.
    For output, all that counts to me is that sometimes I want
    to empty internal buffers. But there's no reason to wait
    until output is reported complete to continue.
    Waiting for completion is a potentially expensive operation.
    It's opposed to the idea streaming.
    (This is not off topic, because fresh-line is what causes
    the many EFAULT errors via finish-output)

    I agree with Bruno that fsync() and tcdrain() must not
    depend on each other (unless one succeeds, of course, which
    the current patch implements).
    One could argue that if tcdrain() returns ENOTTY, there's no
    need to call ioctl() to try the same in other words. But
    that's a minor thing now.

    I'd like to know whether it's finish_tty_output()
    specification to be able to handle all kinds of streams, or
    whether it should be restricted to (pseudo) ttys only. In
    the latter case, that would mean there a bug in CLISP which
    causes this function to be used even on sockets and pipes.

    For now, it looks like I have to report a kernel (or glibc?)
    bug about tcdrain() returning EFAULT on a socket (and also
    ssh, probably for the same reason).
    Yet we could work-around the bug in some way (e.g. via
    fresh-line -> force-output instead of finish-output, even
    though direct calls to finish-output would fail).

    If nobody complains, I'll have FRESH-LINE use FORCE-OUTPUT.
    That will make the --version symptom disappear (and
    possibly provide some speed-up).

     
  • Jörg Höhle
    Jörg Höhle
    2006-11-17

    Logged In: YES
    user_id=377168
    Originator: NO

    For the record, when trying and changing force-output to finish-output in FRESH-LINE, clisp randomly(!) outputs one of two forms given the following command:
    ./lisp1.run -M lispinit.mem -x '(progn (format *standard-output* "~&line1~.") (format *error-output* "~&line2~.") *error-output*)' | cat

    Either:
    line2 ;<<<----!!!
    i i i i i i i ooooo o ooooooo ooooo ooooo
    [...banner...]
    Copyright (c) Sam Steingold, Bruno Haible 2001-2006

    line1
    #<OUTPUT UNBUFFERED FILE-STREAM CHARACTER #P"/dev/fd/2">
    or:
    i i i i i i i ooooo o ooooooo ooooo ooooo
    [...banner...]
    Copyright (c) Sam Steingold, Bruno Haible 2001-2006

    line1
    line2#<OUTPUT UNBUFFERED FILE-STREAM CHARACTER #P"/dev/fd/2">

    These results were incredible at first. They tell a lot about scheduling in Linux.

    One might raise the question about whether it's CLISP's job to automatically FINISH-OUTPUT in such a case when the programmer didn't bother. I.e. is the first form of output really unacceptable?

    $ date --foo | date --version
    produces exactly similar symptoms on my Linux box: randomly, the complaint about foo appears before or after the version text.

    In summary, the change is not in CVS and I'm not really satisfied with the current handling of stdin/stderr or *terminal-io* handling in clisp.

    In any case, I think I should now report a kernel bug and ask for harmonization of errno codes or the origin of EFAULT.

    My findings are:
    $ ~/Bugs/tcdrain
    tcdrain(0)=(0,0)
    tcdrain(1)=(0,0)
    tcdrain(2)=(0,0)
    $ ~/Bugs/tcdrain | cat
    tcdrain(0)=(0,0)
    tcdrain(1)=(-1,22) # EINVAL = Invalid argument
    tcdrain(2)=(0,0)
    $ ~/Bugs/tcdrain 2>/dev/null
    tcdrain(0)=(0,0=)
    tcdrain(1)=(0,0=)
    tcdrain(2)=(-1,25) # ENOTTY = Inappropriate ioctl for device
    $ echo foo | ~/Bugs/tcdrain
    tcdrain(0)=(-1,22)
    tcdrain(1)=(0,0)
    tcdrain(2)=(0,0)
    # Now I'd expect an error similar to the previous ones,
    or possibly EOPNOTSUPP, but not EFAULT:
    $ ssh localhost ~/Bugs/tcdrain
    tcdrain(0)=(-1,14) # EFAULT = Bad address
    tcdrain(1)=(-1,14) # EFAULT
    tcdrain(2)=(-1,14) # EFAULT
    $ ssh -t localhost ~/Bugs/tcdrain
    tcdrain(0)=(0,0)
    tcdrain(1)=(0,0)
    tcdrain(2)=(0,0)
    Connection to localhost closed.

    Does anybody know how to redirect I/O to/from a socket on the command-line, without intervening ssh or installing inetd?

     
  • Sam Steingold
    Sam Steingold
    2006-11-17

    Logged In: YES
    user_id=5735
    Originator: NO

    >I think I should now report a kernel bug

    please do.
    this way we will learn from the true source of knowledge about the subject.

    BTW, SF CF also gives you solaris, *bsd, and macos, did you try testing there?