image/binary patch complete

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

  OK, so here it is:  the image/binary patch with all the alterations to 
reading binary files that I had envisioned.  I think it works pretty 
well, and it is as far as I wish to take the code at this time.  In this 
last version, it is the data file stuff that has changed.  So only that 
is addressed here.

The patch has been uploaded to the SourceForge site. 
 ("with_image_26aug2004.patch" and "with_image_data_24aug2004.patch" are 
required.)  If you want to see just the results, I've updated the PDF 
output of the demo at

http://acer-access.com/~ds...@ac.../gnuplot/image.pdf

There are a few more plots, dealing mainly with matrix (gnuplot) binary 
and matrix ASCII files and how they are integrated with the new 
"df_readbinary()" variant of "df_readline()".

I'll be out of town the rest of this week, and I won't have much time to 
work on this anymore once September rolls around.  But as I've said, 
I've taken this now to the point of what I imagined.  Minor details 
about the syntax might not be agreeable with everyone, but I think in 
terms of coding, it would be fairly straightforward to alter bits of 
code to achieve any conceivable alterations...  As for incremental 
features, like data compression in PostScript files, maybe a project for 
next year.

Dan

PS:  If you want to read on, here are the main points:

** I've put map3d_xy() back to the way it was so that creating a 
PostScript file for `all.dem` under the CVS version and the patched 
version differs only in the dates.  This means that map3d_xy_double() is 
pretty much a replication of map3d_xy().  I've left in, and #defined 
out, the method where map3d_xy() is derived from map3d_xy_double(), 
along with a descriptive note in case that can be simplified in the future.

**  I've moved all binary (and ASCII matrix) file reads into the new 
routine df_readbinary().  With some simple alterations to this routine, 
it can now read in binary matrix (gnuplot binary) files in a "one at a 
time" fashion.  This achieves several things:

1)  It makes for only one way to read data from a file, i.e., 
df_readline().  This is very nice because it makes plot2d.c and plot3d.c 
very similar in program flow.  Good for future development.  Also, it 
allows matrix binary to be used in 2d plots.  Useful for images, or 
perhaps arrow style.  [Of course, it is limited in that there is only 
one "information" column beyond the first two coordinate columns.]

2)  There are now just two spots for user spec interpretation code, in 
df_readascii() and df_readbinary().  Hence both matrix binary and 
general binary use the same using code (very nice!) and furthermore the 
features for general binary may also be used for matrix binary.  [Note 
however, that I've not combined the using code for df_readascii() with 
that of df_readbinary()... maybe a short subroutine to fill in the v[] 
vector would be a good idea after all.]  I've added a few additional 
plots in the `image.dem` script to illustrate and test that capability. 
 [One that I added includes `binary3` four times, translated to 
different locations... gnuplot really crunches away on that one, 
computing the hidden lines.  But I'm impressed it works; it looks 
funky...  Slightly different color scheme between X11 and PDF for the 
last plot, dark blue in X11, peach in PDF.]

3)  Another nice thing about this is that `binary` documentation becomes 
simplified.  Now all the additional syntax for general binary applies to 
matrix binary as well.  So no more of this clumsy documentation for 
`binary` where there is one set of rules for `splot` and another set of 
rules for all other cases.  Hence, I've reorganized the documentation 
slightly so that, for example, 'help binary' gives
__________

 The `binary` keyword allows a data file to be binary as opposed to ASCII.
 There are two formats for binary--matrix binary and general binary.  Matrix
 binary is a fixed format in which data appears in a 2D array with an extra
 row and column for coordinate values.  General binary is a flexible format
 for which details about the file must be given at the command line.

 See `binary matrix` or `binary general` for more details.

Subtopics available for binary:
    general           matrix
__________

and 'help matrix' gives
__________

 The `matrix` flag indicates that the file data (ASCII or binary) are stored
 in matrix format.  The formats are slightly different amongst these 
two.  For
 details, see `matrix ascii` or `matrix binary`.

Subtopics available for matrix:
    ascii             binary
___________

The descriptions under this sub-categories then are pretty much what 
existed before, but I've weeded out anything tying the file types to 
`plot` or `splot`.

4)  df_3dmatrix() is no longer required and is commented out of the 
code.  Similarly, the code inside binary.c is no longer required.  That 
is conditionally commented out as well.  But it is a bit tricky because 
the code inside binary.c is required for the program bf_test.  Since the 
contents inside `binary.c` can't be compiled two different ways at once, 
and I can't alter the Makefile.am file because that would not work for 
when BINARY_DATA_FILE is not defined.  So, I've created a new file 
called `bin_hook.c` as part of the gnuplot_SOURCES, and inside of there 
is only the lines

#ifndef BINARY_DATA_FILE
#include "binary.c"
#endif

** I'll say a bit here about how the integration of matrix binary and 
matrix ASCII into df_readbinary() works.  In the case of matrix binary, 
there is only one data set that can be in a binary matrix file, such 
that after opening the file, a few uses of fread(), ftell() and fseek() 
can figure out the dimensions of the data and locations of the grid 
corners.  The information is then put in the "binary_record" array that 
df_readbinary() uses. There is no need to read through all the data in 
the file.  If df_readline() knows the dimensions of the array, it can 
handle reading in the first row, first column, etc. on an "as you go" basis.

The story is slightly different for ASCII matrix data.  There one can't 
figure out the size of the matrix until _all_ the data has been read in. 
 There was a routine for doing that, df_read_matrix().  I wanted to 
re-use that.  So the strategy is to read in all the data, store it as 
floats in memory, then inside df_readline() rather than pulling data 
from a FILE * is comes from memory as a byte stream as though it were 
from a file.  (There are assurances in the code that the endianess is 
properly observed.)  Pretty simple.  I slightly altered df_read_matrix() 
so that it returns after getting a blank line indicating the end of a 
record.  That is, here is the core of the ASCII matrix routine:

    /* Keep reading matrices until file is empty. */
    while (1) {
        if ((matrix = df_read_matrix(&nr, &nc)) != NULL) {
            int index = df_num_bin_records;
            /* *** Careful!  Could error out in next step.  "matrix" should
             * be static and test next time. ***
             */
            df_add_binary_records(1, DF_CURRENT_RECORDS);
            df_bin_record[index].memory_data = (char *) matrix;
            matrix = NULL;
            df_bin_record[index].scan_dim[0] = nc;
            df_bin_record[index].scan_dim[1] = nr;
            df_bin_record[index].scan_dim[2] = 0;
            df_bin_file_endianess = THIS_COMPILER_ENDIAN;
        } else
            break;
    }

This sets up df_readline() to do its thing.  Note that with this setup, 
`index` works for ASCII matrix (see demos), which I'm not it did before, 
in reading some comments in datafile.c.

So ASCII matrix reads in all the data to memory, binary matrix doesn't. 
 (I know, but the point[] array in the plot is much more than the 
original data file.  But good programming is good practice.  Also, if 
decimation were used, then the contents that end up in memory will be a 
fraction of the original file size.)  But, generally, ASCII matrix won't 
be such big files, so reading all the contents of an ASCII data file 
into memory isn't the worst of methods.  (The alternative would be to 
read the ASCII file twice... don't like that one!)

** Internal to datafile.c, I've altered the variable df_matrix slightly. 
 Near the end of df_open(), df_matrix is set to TRUE if the data came 
from a matrix format (either binary or ASCII) _or_ if it came from 
binary data when the `array` keyword were used (i.e., two dimensions 
with consistent coordinates).  This makes df_matrix still accessible to 
the rest of gnuplot code, but it ensures that no outside code can alter 
the value and change inadvertently how the datafile.c code might behave 
for successive calls to df_readline().

A similar idea is used for the df_binary variable.

So inside of datafile.c now there are a variety of similar variables, 
that at first seem like name overload:

/* Logical variables indicating information about data file. */
TBOOLEAN df_binary_file;
TBOOLEAN df_matrix_file;

/* Binary *read* variables used by df_readbinary().  The difference 
between matrix
 * binary and general binary is that matrix binary requires an extra 
first column
 * and extra first row giving the sample coordinates.  Furthermore, note 
that if
 * ASCII matrix data is converted to floats (i.e., binary) then it 
really falls in
 * the general binary class, not the matrix binary class.
 */
TBOOLEAN df_read_binary;
TBOOLEAN df_matrix_binary;

As the note says, the fact that ASCII matrix and binary matrix are 
slightly different complicates matters.  I think it is easier going with 
a few more variables than trying to assign so much meaning to df_matrix 
conditioned on what other setting might be.

** I propose that eventually "df_binary" be removed from datafile.h and 
datafile.c.  There is only one instance where df_binary is used outside 
of datafile.c:

        if (this_plot->plot_type == DATA3D && df_binary==TRUE && 
end_token==start_token+1)
            /* let default title for  splot 'a.dat' binary  is 'a.dat'
             * while for  'a.dat' binary using 2:1:3  will be all 4 words */
            m_capture(&(this_plot->title), start_token, start_token);
        else

But, as I've argued before, there is no reason for the rest of gnuplot 
code to know whether or not data came from a binary file.  Can the above 
bit of code be changed so that it keys off the presence of a using string?

I've left a note in plot3d.c near the above code as a reminder to 
address this eventually.  However, I didn't alter the code because that 
would probably cause the output PostScript files for 'all.dem' for old 
and patched versions of gnuplot to be different.

** You may be wondering about `df_matrix`, why should that variable be 
made available to code outside datafile.c?  Well, that is useful 
information to some parts of gnuplot, specifically the scan line, grid 
code.  Also, the image code can make use of such a variable, but one 
that indicates _uniform_ grid, which df_matrix doesn't necessarily do. 
 (However, in the case of image code, it is just a shortcut to save some 
computations on the "grid check" code that the image routine uses 
otherwise.)

** Why is ASCII matrix data not consistent with binary matrix data in 
the sense that the x and y coordinates may be derived from the first row 
and column of the matrix?  Matrix data with row indeces used for 
coordinates has some use, but having something similar to binary matrix 
is more useful.  I'm not sure which came first, but if ASCII matrix is 
fairly recent, would it be possible to change the format to the same as 
gnuplot binary?  I know this sort of thing is not a nice thing to do to 
users, but if use of ASCII matrix isn't widespread, the repercussions 
might not be too bad.  I inquire because I notice that there is no use 
of `matrix` in any of the demo scripts.

I'm not too concerned about this, however.  I think there is some 
flexibility there with being able to pass ASCII matrix coordinates 
(i.e., the indeces) through a function.  E.g., "using (2*$1):(0.5*$2):3".

** The last item here has to do with not being able to pass coordinates 
generated for general binary through a function, just as the 0, -1, -2 
fields indicating datum, line, block can't go through a function.  I 
don't think this is of immediate concern because I can't think of too 
many ways that the occasion would arise to pass uniformly spaced samples 
through some kind of function.  If you want the gory details about the 
issue, here is a copy of note I sent to Petr:

>I don't know whether I've got the point, but I think that in the (new)
>binary image matrix you cannot address x- and y- indices, i.e.  splot 'a.edf' using -2:-1:3 with image
>to index the axes via column number and row number, or can you?
>

Yes, one can select just a particular column,

plot 'blutux.rgb' binary array=128x128 flip=y format='%uchar' using 3 
with image

But you are onto the point I was trying to make.

One point is that I don't want to have the notation so that the example 
you give, would be

splot 'a.edf' using 1:2:5

Where 1 and 2 are the generated, or implicit, columns... like in the 
case of `matrix`...  I think that is downright confusing.  The user sets 
up their columns in a file, then they have to remember to add two to the 
front of it.  Just confusing I think.

> What was
>the point of your discussion?  Actually I would like these -1 and -2. Just few days earlier I wanted to
>  find the measurement number at a given angle (x-axis of the image), and
>  I had to recalculate the image (from experimental data).
>Petr
>  
>

Plus, you know you should be able to override any in file settings, at 
the command line, if that helps your situation any.

Here the 0, -1 and -2 are like in the datum, line count and index.  They 
are not the _generated_ values associated with array=50x50, for example. 
 That was another of my points.

And my last point is that the 0, -1 and -2 can't be passed through an 
expression the way columns can, i.e., $1.

You see, in order for something to be used in an expression, I believe 
that quantity needs to exist in the df_column[] array.  But the 0, -1, 
and -2 information are not put into a column, they are just drawn from a 
variable, i.e.,

        } else if (column == -2) {
            v[output] = df_current_index;
        } else if (column == -1) {
            v[output] = line_count;
        } else if (column == 0) {
            v[output] = df_datum;    /* using 0 */

And similarly, the generated coordinates that I added also don't go into 
df_column[]:

        } else if (column == -5) {      /* Perhaps try using a switch 
statement to avoid so many tests. */
            v[output] = o_value*delta[2];
        } else if (column == -4) {
            v[output] = n_value*delta[1];
        } else if (column == -3) {
            v[output] = m_value*delta[0];

Now, I can easily change that by stuffing those values instead into 
df_column[].  Say, under the hood I extend the df_column[] array by 
three and put those values in there.  Then they could be passed through 
an expression, if only the thing that parses the expressions knows that 
it should treat a -1, say, as (df_no_cols + 1).  See my point?  In the 
case of `matrix`, the indeces are immediately stuffed into df_column[0] 
and df_column[1]:

        /* Fill backward so that current read value is not overwritten. */
        for (j=df_no_bin_cols-1; j >= 0; j--) {
            if (j == 0)
            df_column[j].datum = df_matrix_binary ? 
scanned_matrix_row[df_M_count] : df_M_count;
            else if (j == 1)
            df_column[j].datum = df_matrix_binary ? first_matrix_column 
: df_N_count;
            else
            df_column[j].datum = df_column[i].datum;
            df_column[j].good = DF_GOOD;
            df_column[j].position = NULL;
        }

and from that point forward can be used in expressions as $1 and $2.

Dan

image/binary patch complete

A portable, multi-platform, command-line driven graphing utility

image/binary patch complete