Re: Lengthy discussion about datafile.c...

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Ethan Merritt wrote:

>On Saturday 14 August 2004 03:50 pm, Daniel J Sebald wrote:
>  
>
>>I know that for ASCII files the number of columns can be determined by
>>the file itself and gnuplot readjusts accordingly.
>>    
>>
>
>Which brings up another issue.  The description of your binary read
>"format" commands looks *really* fragile.  I mean the stuff being parsed
>in plot_option_binary_format(). I am seriously worried that
>it won't transfer well across 32/64 bit machines, that it won't handle
>string data, and worst of all that it requires too much user-knowledge
>of file and data types.  Basically I don't like it.
>

Don't have a 64 bit machine to try this on.  But the question as to how 
it will transfer is a matter of how data is stored in the file.  Is 
there a 64-bit IEEE floating point format?  There probably is.  32-bit 
floats in files are still certainly readable.  64-bit should work so 
long as the native file byte order matches the CPU/compiler order byte 
order.

>[EAM puts on geezer hat] In the old days of Fortran programming and 
>VMS file systems, binary files had actual "records".   In those days
>there was an obvious parallel between "columns" in an ascii file
>and "records" in a binary file.  But that approach has been drowned
>by the unix notion that "everything is a stream of bytes".  
>

I know, that's the crux.

>It's *really hard* to figure out what data is in a binary stream, and I
>am dubious that it is worth spending thousands of lines of code in
>gnuplot trying to do so.  The unix way in such a case would be to
>run the input binary data through a tailored filter on its way into
>gnuplot.  That way gnuplot only has to know about ascii input, and
>you can debug a suitable filter for your application without having
>to recode gnuplot.  
>

The problem is that an image of say 500 x 500 pixels gets very big in ASCII.

>Your docs say  
>	+ Gnuplot will retrieve a number of binary
>	+ variables equal to the largest column specified in the `<using list>`.
>	+ For example, `using 1:3` would cause three columns to be read, of which
>	+ the second will be ignored. 
>So how do you handle the case of 10 logical columns of data in the file,
>of which you only want to read the 2nd and 4th?  How do you skip "columns"
>5 to 10 of each "line"?
>

"format" is supposed to be analogous to the "using" format string, so 
something like the following should work

plot "datafile.dat" binary format ="%10float" using 2:4

(But actually, I see there is a bug because 10 is greater than MAX_COLS, 
which is a silly restriction in the code... I'll fix that.)

Or, if there were a mix of variable types

plot "datafile.dat" binary format = 
"%*int16%float32%*float32%int16%3*int%3*float"

I agree that unless one uses this a lot it is a bit arcane.  But recall, 
one of the primary uses is for automation.  For example, passing an 
image from Octave to Gnuplot in binary is an example.  Once the image() 
script in Octave is written with the proper format string, there is no 
need to deal with that again in Octave.

>What constitutes the logical equivalent of a "blank line" in your binary
>files? Or is there no equivalent to the auto-determination of scan lines?
>

A blank line occurs when the scan line reaches its end.  For example, 
here is the scatter2 example from the image demo

splot 'scatter2.bin' binary record=30,30,29,26 endian=little using 1:2:3

which means blank lines occur at the 30th line, 60th line, etc. 
 Whatever application that is sending the data to gnuplot must know the 
quantity being sent.  If that information is stored within the datafile 
and must be interpretted, then that requires additional routines, an 
example of which Petr has supplied.  Such routines are easy to link in, 
but I'm not enthusiastic about writing all sorts of binary file routines 
for the bazillion different formats in the computer world.  My original 
goal with all this was to quickly pump raw image data across to Octave.

>Do you plan to handle strings?  How? Would you require a full "binary format"
>description in this case?  Is there such a thing as a matrix of strings?
>

Well, I'd not thought of strings in the binary file, but perhaps 
something like "%s" in the format string?  I would probably make the 
restriction that the strings within the file need to be NULL terminated. 
 That's not an unrealistic expectation, is it?  Or wait, maybe "%s" 
could be general length but NULL terminated; "%[#]s" could be a fixed 
length of # characters.

I'm not sure what you mean by a matrix of strings.

>The matrix variant is far more straight-forward.  I would think this will be 
>by far the most common use anyhow, and it would cover the pixel images
>that you obviously have fondness for. Could we maybe have a first cut
>version of this patch that only deals with matrix format binary data?
>

What is the matrix variant?  Gnuplot binary?  That was available all 
along.  However, gnuplot binary doesn't work for color images.  (Need 
three channels for that.)  The switch BINARY_DATA_FILE can be undefined 
to remove binary datafiles from the code.  Gnuplot binary would still 
work with the swicth off.

Dan

Re: Lengthy discussion about datafile.c...

A portable, multi-platform, command-line driven graphing utility

Re: Lengthy discussion about datafile.c...