Hi, I'm trying to read a (legacy dos) print file and output a CSV file to view the print file in Excel.
I'm processing the input byte by byte taking only printable characters and outputing comma seperated fields - enclosing text in quotes.
Processing byte by byte how do I check that I've arrived at the end of the print line - it terminates hex '0d0a'.?
TheFile Control is
::cobol
SELECT PRN-IN
ASSIGN TO 'C:\GCcobsrc\S041IBUD.PRN'
ORGANIZATION IS LINE SEQUENTIAL.
Did try and read it but its a code page type that I do not have in my system so cannot pass it to my wife who is reasonably fluent but really this is not important. Just as well as I use Linux.
So back to your problem : - before read each record flood the 01 record with say with high-values (you might need to change this value to a ALL X"FF" or similar).
Then read each RPN-CHAR (A : 1) testing for high-value and if so break out of perform i.e.,
As the file contains LS records the EOL is immaterial as pre-loading high-value before each read you only need to check for that. You just have to increase the record size a few bytes longer.
Warning if you are using any of the Ivrit variants that are double byte then you have to double the size of the record but these are fairly obscure ie not easy to find anyway.
Easy.
Last edit: Simon Sobisch 2021-12-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
Anonymous
-
2021-12-27
A DOS printer file contains printer control characters other than CR and LF.
Control characters such as new page (NP x'0C'), tabs (HT x'09') and SUB x'1A'.
So ORGANIZATION IS LINE SEQUENTIAL is cumbersome and not really suitable to parse the sample file.
Perhaps it would be better to read the whole file in to storage and then parse the data.
This would make scraping the required data simpler.
In the sample file is defined as follows:
each new page begins with CR,NP
max line length is 150 + 2 (CR,LF)
each page contains 7 header lines delimited by CR,LF
no page footer
some lines contain tabs
detail lines, except last, are double spaced delimited by CR,LF,CR,LF
report end is delimited by CR,x'1A'
Some sample code:
::cobolfree
SELECT PRN-IN ASSIGN TO file-in-prn ORGANIZATION IS SEQUENTIAL.
FD PRN-IN
01 PRN-IN-data X(65536).
01 data-buffer.
data-buffer-char X occurs 65536 times.
move X"FF" to data-buffer.
open PRN-IN.
read PRN-IN into data-buffer.
close PRN-IN.
move 1 to index1
move 2 to index2
perform a100-parse-data until end-of-data or index1 > 65535
a100-parse-data.
*> check for end of reportif data-buffer-char(index1) equals CR and data-buffer-char(index2) equals x'1A'then move true to end-of-dataelse*> check new page
...
if index2 > 65536
then
move true to end-of-data
move true to data-overflow
end-if
end-if
Hope this helps.
👍
1
Last edit: Simon Sobisch 2021-12-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I'm trying to read a (legacy dos) print file and output a CSV file to view the print file in Excel.
I'm processing the input byte by byte taking only printable characters and outputing comma seperated fields - enclosing text in quotes.
Processing byte by byte how do I check that I've arrived at the end of the print line - it terminates hex '0d0a'.?
TheFile Control is
The File Section
The file is attached - contains Hebrew text.
Thanks for any help.
Last edit: Simon Sobisch 2021-12-27
Did try and read it but its a code page type that I do not have in my system so cannot pass it to my wife who is reasonably fluent but really this is not important. Just as well as I use Linux.
So back to your problem : - before read each record flood the 01 record with say with high-values (you might need to change this value to a
ALL X"FF"or similar).Then read each
RPN-CHAR (A : 1)testing for high-value and if so break out of perform i.e.,As the file contains LS records the EOL is immaterial as pre-loading high-value before each read you only need to check for that. You just have to increase the record size a few bytes longer.
Warning if you are using any of the Ivrit variants that are double byte then you have to double the size of the record but these are fairly obscure ie not easy to find anyway.
Easy.
Last edit: Simon Sobisch 2021-12-27
A DOS printer file contains printer control characters other than CR and LF.
Control characters such as new page (NP x'0C'), tabs (HT x'09') and SUB x'1A'.
So
ORGANIZATION IS LINE SEQUENTIALis cumbersome and not really suitable to parse the sample file.Perhaps it would be better to read the whole file in to storage and then parse the data.
This would make scraping the required data simpler.
In the sample file is defined as follows:
Some sample code:
Hope this helps.
Last edit: Simon Sobisch 2021-12-27