Re: Lengthy discussion about datafile.c...

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Saturday 14 August 2004 08:54 pm, Daniel Sebald wrote:
>
> >Your docs say
> >	+ Gnuplot will retrieve a number of binary
> >	+ variables equal to the largest column specified in the `<using list>`.
> >	+ For example, `using 1:3` would cause three columns to be read, of which
> >	+ the second will be ignored.
> >So how do you handle the case of 10 logical columns of data in the file,
> >of which you only want to read the 2nd and 4th?  How do you skip "columns"
> >5 to 10 of each "line"?
>
> "format" is supposed to be analogous to the "using" format string, so
> something like the following should work
>
> plot "datafile.dat" binary format ="%10float" using 2:4

But according to the documentation I quoted above, that would only
read in 4 logical columns, leaving 6 more unread values in the file
before you get to the next set of input values.  How do you tell it
to skip the next 6 columns?

> >What constitutes the logical equivalent of a "blank line" in your binary
> >files? Or is there no equivalent to the auto-determination of scan lines?
>
> A blank line occurs when the scan line reaches its end.  For example,
> here is the scatter2 example from the image demo
>
> splot 'scatter2.bin' binary record=30,30,29,26 endian=little using 1:2:3
>
> which means blank lines occur at the 30th line, 60th line, etc.

But there you have told it on the command line what the structure is.
The thing about blank lines in an ascii input file is that they define
a structure on the fly; you don't need to specify it on the command line.
I would much rather require a file format that indicates what each
logical line contains.  A blank line is then indicated *in the file* by some
designated code (probably some number of 0s, but whatever).

> Well, I'd not thought of strings in the binary file, but perhaps
> something like "%s" in the format string?  I would probably make the
> restriction that the strings within the file need to be NULL terminated.
>  That's not an unrealistic expectation, is it?  Or wait, maybe "%s"
> could be general length but NULL terminated; "%[#]s" could be a fixed
> length of # characters.

Fixed length strings are not interesting.  You could use NULL-termination,
but only if you specify everything on the command line because otherwise
the input routine doesn't know whether it's reading a string at all.

> I'm not sure what you mean by a matrix of strings.

I mean like an input file consisting of 10 lines of 5 strings each.
Only in this case it would be a binary file containing 50 NULL-terminated
strings that you have somehow flagged as being in a 10x5 matrix.

> What is the matrix variant?  

Like your example above. (At least I *think* that's what your example
was doing).   A regular array of values all of the same sort.  E.g.
a  100x200x300 grid with x varying faster than y faster than z.
But since it's regular and all the entries are the same length you
know exactly where to find every element without any funky format
stuff.

> The problem is that an image of say 500 x 500 pixels gets very big in ASCII.

I think that is a non-issue.  You don't have to store this anywhere; you're 
just piping it in.    But this is the very straightforward case that I called 
a matrix.  You know in advance it's a 500x500 array, and you know how big
each element is.  No need for format statements, using specs, or any of that.

[EAM puts on geezer hat again]  Back in the old days of limited disk space
it was a big win to store numeric data in binary files.  This caused 
man-centuries of time to be wasted in dealing with cross-platform conversions
and uncertainty about the exact format of the binary files.  When disks got
cheap we all heaved a huge sigh of relief and for the most part stopped 
using binary output files.  It's just not worth it.  So what if the ascii 
equivalent is big?  Just compress it and it goes back to being about the
same size as the original binary (OK, that depends a bit on what sort of
data it is).  

Bottom line is I really don't like this general binary input format.
If you know enough about your binary format to write a cryptic 
description like
    plot "datafile.dat" binary format="%*int16%float32%*float32%" \
    record=30,30,29,26 endian=little 
then by gum, you know enough to write a jiffy filter routine and
pipe normal ascii input into gnuplot.   Many users may not be
up to this, but those same users won't be able to figure out the
endian business anyhow.  Where exactly is the big gain?
Simplicity is worth *a lot*.  Far more than saving a little bandwidth
in the input pipe.

Input of binary files containing regular arrays may be worth it
for convenience.   But more complicated input requiring flags for
bit order, word size, floating point format, and pre-announcement
of the file structure? ---- All that strikes me as being more trouble
than it is worth.  Will your code work on an Amiga?  On a 64-bit
VMS machine?   Who is going to explain to users how to set
all the right flags to make it work?

I believe that Petr had some specific applications in mind, so
maybe he can step in and clarify exactly what pieces of this 
code he wanted, and why.  I myself plot many sorts of data in
gnuplot, but I've  never felt a need for direct binary input.

-- 
Ethan A Merritt
Department of Biochemistry & Biomolecular Structure Center
University of Washington, Seattle

Re: Lengthy discussion about datafile.c...

A portable, multi-platform, command-line driven graphing utility

Re: Lengthy discussion about datafile.c...