From: Daniel J S. <dan...@ie...> - 2004-08-26 08:42:09
|
OK, so here it is: the image/binary patch with all the alterations to reading binary files that I had envisioned. I think it works pretty well, and it is as far as I wish to take the code at this time. In this last version, it is the data file stuff that has changed. So only that is addressed here. The patch has been uploaded to the SourceForge site. ("with_image_26aug2004.patch" and "with_image_data_24aug2004.patch" are required.) If you want to see just the results, I've updated the PDF output of the demo at http://acer-access.com/~ds...@ac.../gnuplot/image.pdf There are a few more plots, dealing mainly with matrix (gnuplot) binary and matrix ASCII files and how they are integrated with the new "df_readbinary()" variant of "df_readline()". I'll be out of town the rest of this week, and I won't have much time to work on this anymore once September rolls around. But as I've said, I've taken this now to the point of what I imagined. Minor details about the syntax might not be agreeable with everyone, but I think in terms of coding, it would be fairly straightforward to alter bits of code to achieve any conceivable alterations... As for incremental features, like data compression in PostScript files, maybe a project for next year. Dan PS: If you want to read on, here are the main points: ** I've put map3d_xy() back to the way it was so that creating a PostScript file for `all.dem` under the CVS version and the patched version differs only in the dates. This means that map3d_xy_double() is pretty much a replication of map3d_xy(). I've left in, and #defined out, the method where map3d_xy() is derived from map3d_xy_double(), along with a descriptive note in case that can be simplified in the future. ** I've moved all binary (and ASCII matrix) file reads into the new routine df_readbinary(). With some simple alterations to this routine, it can now read in binary matrix (gnuplot binary) files in a "one at a time" fashion. This achieves several things: 1) It makes for only one way to read data from a file, i.e., df_readline(). This is very nice because it makes plot2d.c and plot3d.c very similar in program flow. Good for future development. Also, it allows matrix binary to be used in 2d plots. Useful for images, or perhaps arrow style. [Of course, it is limited in that there is only one "information" column beyond the first two coordinate columns.] 2) There are now just two spots for user spec interpretation code, in df_readascii() and df_readbinary(). Hence both matrix binary and general binary use the same using code (very nice!) and furthermore the features for general binary may also be used for matrix binary. [Note however, that I've not combined the using code for df_readascii() with that of df_readbinary()... maybe a short subroutine to fill in the v[] vector would be a good idea after all.] I've added a few additional plots in the `image.dem` script to illustrate and test that capability. [One that I added includes `binary3` four times, translated to different locations... gnuplot really crunches away on that one, computing the hidden lines. But I'm impressed it works; it looks funky... Slightly different color scheme between X11 and PDF for the last plot, dark blue in X11, peach in PDF.] 3) Another nice thing about this is that `binary` documentation becomes simplified. Now all the additional syntax for general binary applies to matrix binary as well. So no more of this clumsy documentation for `binary` where there is one set of rules for `splot` and another set of rules for all other cases. Hence, I've reorganized the documentation slightly so that, for example, 'help binary' gives __________ The `binary` keyword allows a data file to be binary as opposed to ASCII. There are two formats for binary--matrix binary and general binary. Matrix binary is a fixed format in which data appears in a 2D array with an extra row and column for coordinate values. General binary is a flexible format for which details about the file must be given at the command line. See `binary matrix` or `binary general` for more details. Subtopics available for binary: general matrix __________ and 'help matrix' gives __________ The `matrix` flag indicates that the file data (ASCII or binary) are stored in matrix format. The formats are slightly different amongst these two. For details, see `matrix ascii` or `matrix binary`. Subtopics available for matrix: ascii binary ___________ The descriptions under this sub-categories then are pretty much what existed before, but I've weeded out anything tying the file types to `plot` or `splot`. 4) df_3dmatrix() is no longer required and is commented out of the code. Similarly, the code inside binary.c is no longer required. That is conditionally commented out as well. But it is a bit tricky because the code inside binary.c is required for the program bf_test. Since the contents inside `binary.c` can't be compiled two different ways at once, and I can't alter the Makefile.am file because that would not work for when BINARY_DATA_FILE is not defined. So, I've created a new file called `bin_hook.c` as part of the gnuplot_SOURCES, and inside of there is only the lines #ifndef BINARY_DATA_FILE #include "binary.c" #endif ** I'll say a bit here about how the integration of matrix binary and matrix ASCII into df_readbinary() works. In the case of matrix binary, there is only one data set that can be in a binary matrix file, such that after opening the file, a few uses of fread(), ftell() and fseek() can figure out the dimensions of the data and locations of the grid corners. The information is then put in the "binary_record" array that df_readbinary() uses. There is no need to read through all the data in the file. If df_readline() knows the dimensions of the array, it can handle reading in the first row, first column, etc. on an "as you go" basis. The story is slightly different for ASCII matrix data. There one can't figure out the size of the matrix until _all_ the data has been read in. There was a routine for doing that, df_read_matrix(). I wanted to re-use that. So the strategy is to read in all the data, store it as floats in memory, then inside df_readline() rather than pulling data from a FILE * is comes from memory as a byte stream as though it were from a file. (There are assurances in the code that the endianess is properly observed.) Pretty simple. I slightly altered df_read_matrix() so that it returns after getting a blank line indicating the end of a record. That is, here is the core of the ASCII matrix routine: /* Keep reading matrices until file is empty. */ while (1) { if ((matrix = df_read_matrix(&nr, &nc)) != NULL) { int index = df_num_bin_records; /* *** Careful! Could error out in next step. "matrix" should * be static and test next time. *** */ df_add_binary_records(1, DF_CURRENT_RECORDS); df_bin_record[index].memory_data = (char *) matrix; matrix = NULL; df_bin_record[index].scan_dim[0] = nc; df_bin_record[index].scan_dim[1] = nr; df_bin_record[index].scan_dim[2] = 0; df_bin_file_endianess = THIS_COMPILER_ENDIAN; } else break; } This sets up df_readline() to do its thing. Note that with this setup, `index` works for ASCII matrix (see demos), which I'm not it did before, in reading some comments in datafile.c. So ASCII matrix reads in all the data to memory, binary matrix doesn't. (I know, but the point[] array in the plot is much more than the original data file. But good programming is good practice. Also, if decimation were used, then the contents that end up in memory will be a fraction of the original file size.) But, generally, ASCII matrix won't be such big files, so reading all the contents of an ASCII data file into memory isn't the worst of methods. (The alternative would be to read the ASCII file twice... don't like that one!) ** Internal to datafile.c, I've altered the variable df_matrix slightly. Near the end of df_open(), df_matrix is set to TRUE if the data came from a matrix format (either binary or ASCII) _or_ if it came from binary data when the `array` keyword were used (i.e., two dimensions with consistent coordinates). This makes df_matrix still accessible to the rest of gnuplot code, but it ensures that no outside code can alter the value and change inadvertently how the datafile.c code might behave for successive calls to df_readline(). A similar idea is used for the df_binary variable. So inside of datafile.c now there are a variety of similar variables, that at first seem like name overload: /* Logical variables indicating information about data file. */ TBOOLEAN df_binary_file; TBOOLEAN df_matrix_file; /* Binary *read* variables used by df_readbinary(). The difference between matrix * binary and general binary is that matrix binary requires an extra first column * and extra first row giving the sample coordinates. Furthermore, note that if * ASCII matrix data is converted to floats (i.e., binary) then it really falls in * the general binary class, not the matrix binary class. */ TBOOLEAN df_read_binary; TBOOLEAN df_matrix_binary; As the note says, the fact that ASCII matrix and binary matrix are slightly different complicates matters. I think it is easier going with a few more variables than trying to assign so much meaning to df_matrix conditioned on what other setting might be. ** I propose that eventually "df_binary" be removed from datafile.h and datafile.c. There is only one instance where df_binary is used outside of datafile.c: if (this_plot->plot_type == DATA3D && df_binary==TRUE && end_token==start_token+1) /* let default title for splot 'a.dat' binary is 'a.dat' * while for 'a.dat' binary using 2:1:3 will be all 4 words */ m_capture(&(this_plot->title), start_token, start_token); else But, as I've argued before, there is no reason for the rest of gnuplot code to know whether or not data came from a binary file. Can the above bit of code be changed so that it keys off the presence of a using string? I've left a note in plot3d.c near the above code as a reminder to address this eventually. However, I didn't alter the code because that would probably cause the output PostScript files for 'all.dem' for old and patched versions of gnuplot to be different. ** You may be wondering about `df_matrix`, why should that variable be made available to code outside datafile.c? Well, that is useful information to some parts of gnuplot, specifically the scan line, grid code. Also, the image code can make use of such a variable, but one that indicates _uniform_ grid, which df_matrix doesn't necessarily do. (However, in the case of image code, it is just a shortcut to save some computations on the "grid check" code that the image routine uses otherwise.) ** Why is ASCII matrix data not consistent with binary matrix data in the sense that the x and y coordinates may be derived from the first row and column of the matrix? Matrix data with row indeces used for coordinates has some use, but having something similar to binary matrix is more useful. I'm not sure which came first, but if ASCII matrix is fairly recent, would it be possible to change the format to the same as gnuplot binary? I know this sort of thing is not a nice thing to do to users, but if use of ASCII matrix isn't widespread, the repercussions might not be too bad. I inquire because I notice that there is no use of `matrix` in any of the demo scripts. I'm not too concerned about this, however. I think there is some flexibility there with being able to pass ASCII matrix coordinates (i.e., the indeces) through a function. E.g., "using (2*$1):(0.5*$2):3". ** The last item here has to do with not being able to pass coordinates generated for general binary through a function, just as the 0, -1, -2 fields indicating datum, line, block can't go through a function. I don't think this is of immediate concern because I can't think of too many ways that the occasion would arise to pass uniformly spaced samples through some kind of function. If you want the gory details about the issue, here is a copy of note I sent to Petr: >I don't know whether I've got the point, but I think that in the (new) >binary image matrix you cannot address x- and y- indices, i.e. splot 'a.edf' using -2:-1:3 with image >to index the axes via column number and row number, or can you? > Yes, one can select just a particular column, plot 'blutux.rgb' binary array=128x128 flip=y format='%uchar' using 3 with image But you are onto the point I was trying to make. One point is that I don't want to have the notation so that the example you give, would be splot 'a.edf' using 1:2:5 Where 1 and 2 are the generated, or implicit, columns... like in the case of `matrix`... I think that is downright confusing. The user sets up their columns in a file, then they have to remember to add two to the front of it. Just confusing I think. > What was >the point of your discussion? Actually I would like these -1 and -2. Just few days earlier I wanted to > find the measurement number at a given angle (x-axis of the image), and > I had to recalculate the image (from experimental data). >Petr > > Plus, you know you should be able to override any in file settings, at the command line, if that helps your situation any. Here the 0, -1 and -2 are like in the datum, line count and index. They are not the _generated_ values associated with array=50x50, for example. That was another of my points. And my last point is that the 0, -1 and -2 can't be passed through an expression the way columns can, i.e., $1. You see, in order for something to be used in an expression, I believe that quantity needs to exist in the df_column[] array. But the 0, -1, and -2 information are not put into a column, they are just drawn from a variable, i.e., } else if (column == -2) { v[output] = df_current_index; } else if (column == -1) { v[output] = line_count; } else if (column == 0) { v[output] = df_datum; /* using 0 */ And similarly, the generated coordinates that I added also don't go into df_column[]: } else if (column == -5) { /* Perhaps try using a switch statement to avoid so many tests. */ v[output] = o_value*delta[2]; } else if (column == -4) { v[output] = n_value*delta[1]; } else if (column == -3) { v[output] = m_value*delta[0]; Now, I can easily change that by stuffing those values instead into df_column[]. Say, under the hood I extend the df_column[] array by three and put those values in there. Then they could be passed through an expression, if only the thing that parses the expressions knows that it should treat a -1, say, as (df_no_cols + 1). See my point? In the case of `matrix`, the indeces are immediately stuffed into df_column[0] and df_column[1]: /* Fill backward so that current read value is not overwritten. */ for (j=df_no_bin_cols-1; j >= 0; j--) { if (j == 0) df_column[j].datum = df_matrix_binary ? scanned_matrix_row[df_M_count] : df_M_count; else if (j == 1) df_column[j].datum = df_matrix_binary ? first_matrix_column : df_N_count; else df_column[j].datum = df_column[i].datum; df_column[j].good = DF_GOOD; df_column[j].position = NULL; } and from that point forward can be used in expressions as $1 and $2. Dan |