#160 errors-test hangs

autogen
closed
Bruce Korb
None
1
2014-06-28
2014-04-27
Andreas Metzler
No

Hello,

about 2 out of 3 tries autogen gets stuck at errors.test when running the testsuite. lt-autogen needs a kill -9 to terminate. This is with 5.18.3pre19. Find attached the output of both a successful and killed run of this test.

cu Andreas

2 Attachments

Discussion

  • Bruce Korb
    Bruce Korb
    2014-05-03

    Those files do not contain enough information to diagnose the issue.
    I run these over and over and over countless times and do not see
    any issue, so I need additional information.

    tar -cJf /tmp/ag-test-failure.txz /tmp/AUTOGEN/autogen-5.18.3/autoopts/test
    
     
  • Bruce Korb
    Bruce Korb
    2014-05-03

    • assigned_to: Bruce Korb
     
  • Hello,

    Find attached the requested tarball. - I have added two versions: One generated while errors.test was still running, and a second one after kill -9 pgrep lt-autogen.

    These are from pre34.

    hth, cu Andreas

     
  • Bruce Korb
    Bruce Korb
    2014-05-10

    Thank you so much! Just still not enough. Indeed, autogen is hung on something in the process of shooting itself down. This text appears in both errors.log files:

    creating errors-sh.samp
    Killing AutoGen 2554
    FAILURE REASON:  duplicate option value characters: X
    

    after you kill -9 it:

    Killed
    ./errors.test: 302: kill: No such process
    errors done
    

    and it proceeds to the next test. Running my version of autogen, it shoots itself down just fine. Thank you so much for your patience, 'cuz here's the next thing to try while hung:

    gdb /path/to/autogen $(ps --no-headers -o pid -C autogen)
    

    and then get a stack trace (^C should get GDB's attention). Maybe poke around a bit. Just FYI, running autogen on errors-2.def gets me:

    $ autogen errors-2.def
    Killing AutoGen 7045
    FAILURE REASON:  duplicate option value characters: X
    AutoGen aborting on signal 15 (Terminated) in state EMITTING
    processing template /u/ROOT/usr/local/share/autogen/optlib.tlib
                on line 29
           for function EXPR (14)
    Aborted
    

    with no need of "kill -9" on it.

    A little more background, just in case it helps. The code that detects multiple identical option value characters is a shell script:

        list=`echo '%s' | sort`
        ulst=`echo \"${list}\" | sort -u`
        test `echo \"${ulst}\" | wc -l` -ne %d && {
          echo \"${list}\" > ${tmp_dir}/sort
          echo \"${ulst}\" > ${tmp_dir}/uniq
          df=`diff ${tmp_dir}/sort ${tmp_dir}/uniq | sed -n 's/< *//p'`
          die 'duplicate option value characters:' ${df}
        }
    

    where "%s" is replaced by the list of value characters, one per line. The "die" function sends the autogen process several signals:

    die() {
      echo "Killing AutoGen ${AG_pid}"
      echo "FAILURE REASON:  $*"
      kill -15 ${AG_pid}
      kill -1  ${AG_pid}
      kill -2  ${AG_pid}
      exit 1
    }
    

    viz. SIGTERM, SIGHUP and SIGINT. Those signals should all get caught and handled with catch_sig_and_bail(). I guess I can change die() to wait one second and then send "kill -9", but it's never been a problem before. (Look for SHELL_INIT_STR in ag-text.def file.) Now that I think about it, my guess is autogen is hanging on a read from a dead shell process. That might take a while.

     
  • Bruce Korb
    Bruce Korb
    2014-06-28

    • status: open --> closed
     
  • Bruce Korb
    Bruce Korb
    2014-06-28

    Fixed.