From: Ethan A M. <merritt@u.washington.edu> - 2004-08-15 04:30:52
|
On Saturday 14 August 2004 08:54 pm, Daniel Sebald wrote: > > >Your docs say > > + Gnuplot will retrieve a number of binary > > + variables equal to the largest column specified in the `<using list>`. > > + For example, `using 1:3` would cause three columns to be read, of which > > + the second will be ignored. > >So how do you handle the case of 10 logical columns of data in the file, > >of which you only want to read the 2nd and 4th? How do you skip "columns" > >5 to 10 of each "line"? > > "format" is supposed to be analogous to the "using" format string, so > something like the following should work > > plot "datafile.dat" binary format ="%10float" using 2:4 But according to the documentation I quoted above, that would only read in 4 logical columns, leaving 6 more unread values in the file before you get to the next set of input values. How do you tell it to skip the next 6 columns? > >What constitutes the logical equivalent of a "blank line" in your binary > >files? Or is there no equivalent to the auto-determination of scan lines? > > A blank line occurs when the scan line reaches its end. For example, > here is the scatter2 example from the image demo > > splot 'scatter2.bin' binary record=30,30,29,26 endian=little using 1:2:3 > > which means blank lines occur at the 30th line, 60th line, etc. But there you have told it on the command line what the structure is. The thing about blank lines in an ascii input file is that they define a structure on the fly; you don't need to specify it on the command line. I would much rather require a file format that indicates what each logical line contains. A blank line is then indicated *in the file* by some designated code (probably some number of 0s, but whatever). > Well, I'd not thought of strings in the binary file, but perhaps > something like "%s" in the format string? I would probably make the > restriction that the strings within the file need to be NULL terminated. > That's not an unrealistic expectation, is it? Or wait, maybe "%s" > could be general length but NULL terminated; "%[#]s" could be a fixed > length of # characters. Fixed length strings are not interesting. You could use NULL-termination, but only if you specify everything on the command line because otherwise the input routine doesn't know whether it's reading a string at all. > I'm not sure what you mean by a matrix of strings. I mean like an input file consisting of 10 lines of 5 strings each. Only in this case it would be a binary file containing 50 NULL-terminated strings that you have somehow flagged as being in a 10x5 matrix. > What is the matrix variant? Like your example above. (At least I *think* that's what your example was doing). A regular array of values all of the same sort. E.g. a 100x200x300 grid with x varying faster than y faster than z. But since it's regular and all the entries are the same length you know exactly where to find every element without any funky format stuff. > The problem is that an image of say 500 x 500 pixels gets very big in ASCII. I think that is a non-issue. You don't have to store this anywhere; you're just piping it in. But this is the very straightforward case that I called a matrix. You know in advance it's a 500x500 array, and you know how big each element is. No need for format statements, using specs, or any of that. [EAM puts on geezer hat again] Back in the old days of limited disk space it was a big win to store numeric data in binary files. This caused man-centuries of time to be wasted in dealing with cross-platform conversions and uncertainty about the exact format of the binary files. When disks got cheap we all heaved a huge sigh of relief and for the most part stopped using binary output files. It's just not worth it. So what if the ascii equivalent is big? Just compress it and it goes back to being about the same size as the original binary (OK, that depends a bit on what sort of data it is). Bottom line is I really don't like this general binary input format. If you know enough about your binary format to write a cryptic description like plot "datafile.dat" binary format="%*int16%float32%*float32%" \ record=30,30,29,26 endian=little then by gum, you know enough to write a jiffy filter routine and pipe normal ascii input into gnuplot. Many users may not be up to this, but those same users won't be able to figure out the endian business anyhow. Where exactly is the big gain? Simplicity is worth *a lot*. Far more than saving a little bandwidth in the input pipe. Input of binary files containing regular arrays may be worth it for convenience. But more complicated input requiring flags for bit order, word size, floating point format, and pre-announcement of the file structure? ---- All that strikes me as being more trouble than it is worth. Will your code work on an Amiga? On a 64-bit VMS machine? Who is going to explain to users how to set all the right flags to make it work? I believe that Petr had some specific applications in mind, so maybe he can step in and clarify exactly what pieces of this code he wanted, and why. I myself plot many sorts of data in gnuplot, but I've never felt a need for direct binary input. -- Ethan A Merritt Department of Biochemistry & Biomolecular Structure Center University of Washington, Seattle |