Lengthy discussion about datafile.c...

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

  Hope this doesn't sound like a lecture, but I want to discuss how to 
keep datafile.c clean and prevent the evolution of convoluted code that 
is starting to occur with that file.

In having done quite of bit of modular and object oriented code for a 
few years, I'd like to discourage the use of global variables across two 
conceptually different pieces of code.  In particular, in this instance 
is the global variable "df_datum".  Let's begin there.

I see that before the histogram strings code was added, df_datum was 
used outside of datafile.c in one--and only one--case, plot2d.c.  I 
would discourage making df_datum available to the outside world in 
datafile.h.  Here is its use:

    case 0:        /* not blank line, but df_readline couldn't parse it */
        {
        df_close();
        int_error(current_plot->token,
              "Bad data on line %d", df_line_number);
        }

    case 1:
        {            /* only one number */
        /* x is index, assign number to y */
        v[1] = v[0];
        v[0] = df_datum;
        /* nobreak */
        }

    case 2:
        /* x, y */
        /* ylow and yhigh are same as y */

OK.  The idea is simple; if only one variable is read from the file then 
the index into the file should be used as the x value and the read 
variable should be used as y.

But I'd like to point out a few things.  First, consider what is being 
done here, the use of df_readline() starting the loop, i.e.,

     while ((j = df_readline(v, max_cols)) != DF_EOF) {

retrieves from df_readline the variables (v[]) and the number of 
variables read (j) in a modular fashion.  However, plot2d.c retrieves a 
short time later something also generated by df_readline (df_datum) as a 
global variable.  That is not good.  If you were evaluating that 
practise for, say, a software exam, ask yourself what grade you would 
give that.

Alright then, so how to avoid using df_datum like that?  Well, I would 
say that that case 1 situation above really could be easily moved into 
df_readline itself  (** referenced later).  Notice that if at the end of 
df_readline() we knew that this was for 2D data  (in the current case, 
that is a given fact because reading 3D data is its own special routine) 
would could have this exact same code.  For example, at the end of 
df_readline()

  if (df_plot_mode == MODE_PLOT && output == 1) {
        v[1] = v[0];
        v[0] = df_datum;
        output++;
  }

The df_plot_mode variable is something I invented for the image stuff 
that is recorded upon openning the file.  (It isn't the main focus of 
this point.)  You might argue, well, then df_readlin() needs to know 
something about how the data is going to be used.  Yes, but all it needs 
to know is that it is intended for 2D data or 3D data.  In other words, 
we are making available to df_readline some knowledge of what the 
*minimum* allowable number of variables is.  In that scenario then, our 
case statement would conclude that *both* 0 variable returned and 1 
variable returned is invalid, i.e.,

    case 0:        /* not blank line, but df_readline couldn't parse it */
    case 1:
        {
        df_close();
        int_error(current_plot->token,
              "Bad data on line %d", df_line_number);
        }

    case 2:
        /* x, y */
        /* ylow and yhigh are same as y */

In some sense the test for case 0 and case 1 is merely a sanity check. 
 It is when the number of returned variables is 2 or greater that the 
meaning of the data is open for interpretation based upon the plot 
style, etc.

Alright so let me summarize three alternatives with the various concepts:

1.  Allow the use of df_datum as a global variable leaving the code as 
is.  This encourages people to use df_datum at will, and that is 
starting to happen.  It isn't exactly modular and clean practise.

2.  Pass into df_readline() a variable indicating the minimum number of 
columns that is valid.  E.g., df_readline(v[], mincols, maxcols), where 
in plot2d.c mincols would be 2 and in plot3d.c mincols would be 3.  I 
think you understand the point.  This is probably the most appropriate 
"modular" syntax, but I would say that passing in that mincols has 
little advantage if in fact we know it will always be 2 or 3 depending 
upon the plot mode.

3.  What I described above.  It is conceptually the same as 2) but 
instead the information is passed to datafile.c via

df_open(int max_using, int plot_mode)

where in plot2d.c plot_mode would be MODE_PLOT and in plot3d.c plot_mode 
would be MODE_SPLOT.  Like case 2), df_datum no longer needs to be 
globally available.

I would opine that alternative #3 is a good balance.

Now, having said all that, I'd like to point out some code that Ethan 
added to df_readline() somewhere:

            /* FIXME EAM - Trap special case of only a single 'using' 
column. */
            /* But really we need to handle general case of implicit 
column 0 */
            if (output == 1)
                 xpos = (axcol == 0) ? df_datum : v[axcol-1];
            else
                 xpos = v[axcol];

I think that is exactly the concept I've discribed in 3) above.  (**)  
 So you can see how things are starting to get convoluted.

While on this topic of df_readline, I wonder if introducing too much 
"plot dependent" stuff into df_readline is a good idea.  For example, 
with the histogram tics, this kind of line seems like it shouldn't be in 
a file-reading routine:

             add_tic_user(axis,temp_string,xpos,0);

Is there some way to move this functionality outside of df_readline() 
back into plot2d.c?

I pose this question because I've been trying to make the case that 
df_readascii() and df_readbinary(), or whatever, should be transparent 
to the calling routine.  If functionality like above keeps being added 
to df_readascii (df_readline) then soon the situation arises where 
certain types of plots can't be done simply because the data comes from 
a binary data file.

Ethan, what is the minimal amount of information that you would need 
coming back from df_readline() to implement headers from files?  If 
df_readline() were equipped with a char pointer for which df_readline 
could realloc() memory and assign a string, would that do it?  That is, 
I might propose

df_readline(v[], maxcols, string)

where string is a character pointer.  Then

             add_tic_user(axis,temp_string,xpos,0);

could be moved to plot2d.c.  I see absolutely nothing wrong with that 
addition.  I think that is much cleaner than working with so many global 
variables.

In any case, if my comments have motivated anyone to change something, 
please hold off until after the image patch... or if you want me to make 
an attempt at removing df_datum from outside datafile.c as part of the 
image patch, I can do that.

Dan

PS:  I've concluded that moving df_readbinary() to another file would 
require the sharing of too many "local" variables.  So I'm thinking that 
just putting it to the bottom of datafile.c is best... that's actually 
where it probably should be organized anyway.

Lengthy discussion about datafile.c...

A portable, multi-platform, command-line driven graphing utility

Lengthy discussion about datafile.c...