[Cobolforgcc-devel] Re: Cobol for GCC

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Rama,

Sorry for the delay in replying. I am getting my house ready for selling at
auction at the moment. In the discussion below, all the file names assume that
they start with "cobr_". This temp.c is really cobr_temp.c.

The file cobr_sort_overview.txt describes the overall structure. At the moment
only routines 4 - basic in core sort and 6 - compare have been completed. The
routine 7 - sort IO routines have been written but not tested. These sort
IO routines are meant to be for handling IO to sort work files if the sort is
too large to fit in memory. These routines are not for doing the IO to the
actual input and output files specified by the programmer.

The way large sorts work is that they take the input a chunk at a time and sort
each chunk in memory. If all the input fits in memory, then there is only one
chunk and we are done. If it does not fit in memory, then the chunks need to be
written out to disk using routine 7 - IO, and then merged. The merge works by
reading in several chunks (maybe up to 10) a record at a time and merging them
into one output chunk which is written out as the merge proceeds. So each merge
pass reduces the number of chunks by a factor of, say, 10. Eventually there is
only one chunk and you are done.

In COBOL the input to a sort can be a file or files, or an input procedure.
Either way, the compiler will generate the code to read the file and will pass
the records one at a time to the sort/merge executive (routine 2) using a
routine called something like 'sort_put_record'. Once the sort is done the
sort/merge executive would hand the records back to the compiler one at a time.
Presumably this would be done by the compiler calling a routine called
something like 'sort_get_record'. The compiler would either call the output
procedure or write the records to a file, depending on what the programmer
asked for in the code.

For a merge, the input files are assumed to be in order.

The compiler would have to pass information to the sort/merge executive to
specify things like maximum memory to be used, where to put the work files,
maximum size of the work files, and the details of the sort fields and the
collating sequence (see below). This interface would be similar to the
interface to sort.c

I hope this clarifies things; if not please ask some more questions.

See also below...

I have cc'd this to cobolforgcc-devel, to keep a record of this. I hope you
don't mind.

Regards,
Tim Josling

"Linga, Rama Krishna (Rama)" wrote:

> Hi Tim.
>
>         I could not understand afew things regarding this sort/merge.

Quite understandable.

>

>
>
>         1.      What is the prime objective of this? Is it to write an
> equivalant code for converting SORT - MERGE usages of COBOL in C. Then what
> are all these collating sequences and how many of them are primarily related
> to this code and in what way?

The main aim is, as you said, to support the sort/merge verbs of COBOL.

The collating sequences are used in the compare routine. In COBOL you can
specify a collating sequence, which means characters are compared using the
collating sequence rather than using the binary values of the characters. See
cobr_compare.[ch]. Effectively the characters are converted using the lookup
table (collating sequence) before being comparesd.

>
>
>         2.      And what are we sorting? Data files / text files and what
> are the format of these files?
>

The intention is to support both text files (delimited by \n) and non-text
files. Non-text files can be either fixed length or variable length, with a
record control word at the start, giving the length.. However at the moment
none of the code to support the various file formats has been written, just
some of the core sort routines have been written.

>
>         3.      And how do we use these formats for sorting. Like, how do we
> know about the field we are going to use for sorting.
>

The overview.txt file gives the suggested module structure. Ted has written 4 -
basic in core sort (sort.c) and 6 - compare function (compare.c) and had
started 7 - sort IO (sort_io.c) but I don't think 7 was complete.

The sort.c routine is passed the structure of the fields in the sort_init call
in the parameter sort_fields. I assume that the compiler generated code would
pass similar information to the sort-merge executive.

The compiler will implement routine (1). This would pass the details of the
fields to the sort/merge executive (not yet written) which would then call the
sort/merge and IO routines.

>
>         4.      When will be the compiler generated code uses the run time
> interfaces of sort and merge?

The compiler generated code will call routine 3 (sort/merge executive). The
interface for this has not been specified.

>
>
>         5.      When is the command levels are used?

The command level (routine 2) would be a stand alone utility, to be written
later on, using the sort/merge code.

>
>
>         6.      What exactly is status of this sort/merge? I looked into
> cobr_sort_readme.txt but that is so vague. I could not get much out of it.

If you look at the overview.txt, the routines 4 and 6 have been done, and part
of 7 (as described above). See also below.

I tend to think that sort.c (4) and compare.c (6) can be kept, but maybe
sort_io.c (7) could be redone.

If I were doing this, I would probably do the merge routine next, then the work
file IO routines and buffer management routines (routine 7).

However it is up to you whether you want to use all of part of Ted's code. It
may be you would find it easier to start again, than try to dissect his code.

>
>         7.      What about merge and sort-merge routines. The current stuff
> appears like just sort related.

No merge code has been written yet.

>
>
>         Before start writing the code, I would like to know these things.
>
>
> Regards.
> rama