Menu

#1209 cobc enters infinite loop when compiling source files containing Ctrl+Z (ASCII 26) EOF marker on Windows

GC 3.x
accepted
5 - default
2026-03-23
2026-03-23
No

Description:
COBOL source files that contain a Ctrl+Z character (ASCII 26, SUB) as an EOF marker cause cobc to enter an infinite loop during compilation on Windows.

These files originate from the book:
"SAMS - Teach Yourself COBOL in 24 Hours"
which includes example source code on a CD-ROM.

The files use Ctrl+Z as an EOF marker, which was common in DOS and CP/M environments.

Observed behavior:
When compiling such a file, cobc does not terminate and appears to hang indefinitely.
The process must be manually interrupted (e.g., via Ctrl+C).

Tested environments:
1) MSYS2 with self-compiled GnuCOBOL 3.3-dev
2) SuperBOL AIO package (June 2024)
3) Arnold Trembley GnuCOBOL 3.2 build (August 2023)

Expected behavior:
cobc should correctly handle Ctrl+Z EOF markers (possibly issuing a warning) and terminate compilation normally without entering an infinite loop.

Additional information:

  • Sample source files attached
  • Screenshots attached
6 Attachments

Discussion

  • Simon Sobisch

    Simon Sobisch - 2026-03-23
    • labels: --> cobc, win32
    • status: open --> accepted
    • assigned_to: Simon Sobisch
    • Group: GC 3.2 --> GC 3.x
     
  • Simon Sobisch

    Simon Sobisch - 2026-03-23

    Hm, recent cobc on GNU/Linux says:

    cobc HELLO.COB
    HELLO.COB:1: warning: ignoring unknown directive: '@OPTIONS MAIN' [-Wothers]
        1 > 000020 @OPTIONS MAIN
        2 | 000021 Identification Division.
        3 | 000030 Program-Id.  Hello.
    HELLO.COB:12: warning: line not terminated by a newline [-Wmissing-newline]
       10 | 000073 Hello-Start.
       11 | 000083     Display "Hello World".
    <EOF>> 000084     Stop Run.
    HELLO.COB: in paragraph 'Hello-Start':
    HELLO.COB:12: error: invalid symbol '0xd' - skipping word
       10 | 000073 Hello-Start.
       11 | 000083     Display "Hello World".
    <EOF>> 000084     Stop Run.
    

    COBOL compilers on godbolt seem to just ignore the data as "in column 1-6 -> ignored" - which is identical if the file is converted to unix lf before on GNU/Linux; if converted back then there's no issue on Windows compilers either...

    But compiling directly from the file as-is with native Windows builds runs into that error - both with old and new versions of cobc.

    Checking further: this applies only to the final processing -> cobc -E -o hello.i hello.cob does not lead to that error but a following cobc hello.i does.

    ... and because of the "special" debug handling (you can't interrupt a mingw / dwarf generated binary and see something reasonable [not with GDB, LLDB seems to only support minimal dwarf...] so can't just go "up" to see where the issue is)

    I think that:

    • the preparser should remove that already (it could also be there multiple times in case of copybooks)
    • the parser (lexer) should handle that - for now I'd just ignore it there (which may break some UTF8 source files)

    I'll have a further look.

     
    • Michael Del Solio

      Thank you very much.

      Additional finding with Hex Editor-Plugin (VSCode) after converting/reconverting:

      The issue seems to occur specifically when Ctrl+Z (0x1A) follows a CR (0x0D) without a trailing LF (0x0A).

      Working file ending:
      CR LF SUB

      Failing file ending:
      CR SUB

      c:\_Share\Bug-Report-SUB-EOF>cobc -x HELLO-OK.cob
      HELLO-OK.cob:1: warning: ignoring unknown directive: '@OPTIONS MAIN' [-Wothers]
          1 > 000020 @OPTIONS MAIN
          2 | 000021 Identification Division.
          3 | 000030 Program-Id.  Hello.
      
      c:\_Share\Bug-Report-SUB-EOF>cobc -x HELLO-NOK.cob
      HELLO-NOK.cob:1: warning: ignoring unknown directive: '@OPTIONS MAIN' [-Wothers]
          1 > 000020 @OPTIONS MAIN
          2 | 000021 Identification Division.
          3 | 000030 Program-Id.  Hello.
      HELLO-NOK.cob:12: warning: line not terminated by a newline [-Wmissing-newline]
         10 | 000073 Hello-Start.
         11 | 000083     Display "Hello World".
         12 > 000084     Stop Run.<EOF>
      HELLO-NOK.cob: in paragraph 'Hello-Start':
      ' - skipping word error: invalid symbol '
         10 | 000073 Hello-Start.
         11 | 000083     Display "Hello World".
         12 > 000084     Stop Run.<EOF>
      
      unknown (signal^C)
      
      cobc:
      aborting cc:\_Share\Bug-Report-SUB-EOF>ompile of HELLO-NOK.cob at line 12 (PROGRAM-ID: Hello)
      
       

      Last edit: Michael Del Solio 2026-03-23
  • Simon Sobisch

    Simon Sobisch - 2026-03-23

    The difference between the preparser and the scanner is that the scanner of the preparser reads single characters from the stream (getc), builds up a buffer and handles 0x1a on its own, while the scanner of the parser reads in until a newline or the buffer is full (using fgets) (which has 32k, a limit I think may only be reached for internal directives [like reserved word specifications, source/line references as the preparser scanner buffer has a much smaller limit)

    And if fgets on Windows sees 0x1a it returns 0x00; in the case of an unexpected symbol (here: a spare 0x0d) we read until the end of the word (newline or EOF) - and never reach that as 0x00 was not explicit checked (needs to be done because of fgets and 0x1a = 0x00 on Windows).

    If we ever would want to read past 0x1a we'd need to change the parser's scanner to read byte-wise from the stream and create the buffer - similar to what we do with the preparser.

    ... the sole reason that the scanner has seen it is that the explicit handling of 0x1a was not used because it was placed after a catch-all (it needs to be up-front as both have a size of 1, same size = place in the scanner definition provides the order), so I've fixed that as well.

    I'll check all those adjustments when I'm getting back to GC and need to think about how to add that best to the testsuite (most likely a normal source + printf on the command line, if we use that in other places already)...

     

Log in to post a comment.

MongoDB Logo MongoDB