GnuCOBOL / Bugs / #1209 cobc enters infinite loop when compiling source files containing Ctrl+Z (ASCII 26) EOF marker on Windows

Simon Sobisch - 2026-03-23

labels: --> cobc, win32

status: open --> accepted

assigned_to: Simon Sobisch

Group: GC 3.2 --> GC 3.x
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Simon Sobisch - 2026-03-23

Hm, recent cobc on GNU/Linux says:

cobc HELLO.COB HELLO.COB:1: warning: ignoring unknown directive: '@OPTIONS MAIN' [-Wothers] 1 > 000020 @OPTIONS MAIN 2 | 000021 Identification Division. 3 | 000030 Program-Id. Hello. HELLO.COB:12: warning: line not terminated by a newline [-Wmissing-newline] 10 | 000073 Hello-Start. 11 | 000083 Display "Hello World". ␦<EOF>> 000084 Stop Run. HELLO.COB: in paragraph 'Hello-Start': HELLO.COB:12: error: invalid symbol '0xd' - skipping word 10 | 000073 Hello-Start. 11 | 000083 Display "Hello World". ␦<EOF>> 000084 Stop Run.

COBOL compilers on godbolt seem to just ignore the data as "in column 1-6 -> ignored" - which is identical if the file is converted to unix lf before on GNU/Linux; if converted back then there's no issue on Windows compilers either...

But compiling directly from the file as-is with native Windows builds runs into that error - both with old and new versions of cobc.

Checking further: this applies only to the final processing -> cobc -E -o hello.i hello.cob does not lead to that error but a following cobc hello.i does.

... and because of the "special" debug handling (you can't interrupt a mingw / dwarf generated binary and see something reasonable [not with GDB, LLDB seems to only support minimal dwarf...] so can't just go "up" to see where the issue is)

I think that:

the preparser should remove that already (it could also be there multiple times in case of copybooks)

the parser (lexer) should handle that - for now I'd just ignore it there (which may break some UTF8 source files)

I'll have a further look.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Michael Del Solio - 2026-03-23
  
  Thank you very much.
  
  Additional finding with Hex Editor-Plugin (VSCode) after converting/reconverting:
  
  The issue seems to occur specifically when Ctrl+Z (0x1A) follows a CR (0x0D) without a trailing LF (0x0A).
  
  Working file ending:
  CR LF SUB
  
  Failing file ending:
  CR SUB
  
  c:\_Share\Bug-Report-SUB-EOF>cobc -x HELLO-OK.cob HELLO-OK.cob:1: warning: ignoring unknown directive: '@OPTIONS MAIN' [-Wothers] 1 > 000020 @OPTIONS MAIN 2 | 000021 Identification Division. 3 | 000030 Program-Id. Hello. c:\_Share\Bug-Report-SUB-EOF>cobc -x HELLO-NOK.cob HELLO-NOK.cob:1: warning: ignoring unknown directive: '@OPTIONS MAIN' [-Wothers] 1 > 000020 @OPTIONS MAIN 2 | 000021 Identification Division. 3 | 000030 Program-Id. Hello. HELLO-NOK.cob:12: warning: line not terminated by a newline [-Wmissing-newline] 10 | 000073 Hello-Start. 11 | 000083 Display "Hello World". 12 > 000084 Stop Run.<EOF> HELLO-NOK.cob: in paragraph 'Hello-Start': ' - skipping word error: invalid symbol ' 10 | 000073 Hello-Start. 11 | 000083 Display "Hello World". 12 > 000084 Stop Run.<EOF> unknown (signal^C) cobc: aborting cc:\_Share\Bug-Report-SUB-EOF>ompile of HELLO-NOK.cob at line 12 (PROGRAM-ID: Hello)
  
  Last edit: Michael Del Solio 2026-03-23
  
  HELLO-OK-NOK.COB.zip
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Simon Sobisch - 2026-03-23

The difference between the preparser and the scanner is that the scanner of the preparser reads single characters from the stream (getc), builds up a buffer and handles 0x1a on its own, while the scanner of the parser reads in until a newline or the buffer is full (using fgets) (which has 32k, a limit I think may only be reached for internal directives [like reserved word specifications, source/line references as the preparser scanner buffer has a much smaller limit)

And if fgets on Windows sees 0x1a it returns 0x00; in the case of an unexpected symbol (here: a spare 0x0d) we read until the end of the word (newline or EOF) - and never reach that as 0x00 was not explicit checked (needs to be done because of fgets and 0x1a = 0x00 on Windows).

If we ever would want to read past 0x1a we'd need to change the parser's scanner to read byte-wise from the stream and create the buffer - similar to what we do with the preparser.

... the sole reason that the scanner has seen it is that the explicit handling of 0x1a was not used because it was placed after a catch-all (it needs to be up-front as both have a size of 1, same size = place in the scanner definition provides the order), so I've fixed that as well.

I'll check all those adjustments when I'm getting back to GC and need to think about how to add that best to the testsuite (most likely a normal source + printf on the command line, if we use that in other places already)...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

cobc enters infinite loop when compiling source files containing Ctrl+Z...

A free COBOL compiler

Group

Searches

Help

#1209 cobc enters infinite loop when compiling source files containing Ctrl+Z (ASCII 26) EOF marker on Windows

Discussion