Re: Lengthy discussion about datafile.c...

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

mi...@ph... wrote:

>>>Well, I'd not thought of strings in the binary file, but perhaps
>>>something like "%s" in the format string?  I would probably make the
>>>restriction that the strings within the file need to be NULL
>>>terminated.
>>>I'm not sure what you mean by a matrix of strings.
>>>      
>>>
>
>Isn't "binary file with strings" the same as "ascii file with strings",
>just with \0 instead of \n -- thus, "tr" filter would do all. Aha, that
>won't help if binary data and strings are mixed in one file. Do you mean
>this case?
>

Yes, that case.  But I think they still are the same.  If you look at 
binary files with headers in an editor, the strings near the top are 
just as readable as if it were an ascii file.

>>>The problem is that an image of say 500 x 500 pixels gets very big in
>>>ASCII.
>>>      
>>>
>>I think that is a non-issue.  You don't have to store this anywhere;
>>you're  just piping it in.
>>    
>>
>
>Piping is not as fast as direct read.
>
>For example, I have recently benchmarked reading a big .gz files by a C
>program with (1) popen("gzip -c -d"), and with (2) linking it with zlib.
>The case (2) was 15% faster -- quite interesting speed-up if you have to
>read 2 GB of data.
>  
>
>>When disks got cheap we all heaved a huge sigh of
>>relief and for the most part stopped using binary output files.
>>    
>>
>
>I don't think this is right.
>
>- You can work on a computer where you are not authorized to replace hard
>disk (case of many companies).
>

Good point.  That sort of thing has been in U.S. news lately, i.e., 
misplaced drives that shouldn't be misplaced.

> Also notebook hard disks are not cheap,
>fast, and easily replaceable.
>- Digital detectors are improving, and nowadays I have to deal with
>2048x2048x16bit image series. That's plenty of data -- it grows
>quadratically with improvements in detector technology.
>- I guess image processing will never switch to ascii data -- it will
>always be too big and slow.
>

Petr has it exactly right with image processing.  As computers get 
faster, the technology seems to fill the void.  That is a point I wanted 
to make before.  (Call me the "neo-geezer".)

>>Many users may not be
>>up to this, but those same users won't be able to figure out the
>>endian business anyhow.
>>

We've included an option "swap" for which a person doesn't need to know 
what big/little endian mean.  Just swap the order and see how it turns out.

>In the "with image" patch, parameters and thus command line options for
>reading binary (matrix) files were designed carefully so that you can read
>any type of data. Command line options cover the same range of options you
>have to fill for any, even GUI-like, binary image reader, to read an
>arbitrary image data. The user must always know its data, that's it, and
>he is not bother that he has to pass this information to gnuplot or
>whichever other image drawer.
>  
>
>>Simplicity is worth *a lot*.  Far more than saving a little bandwidth
>>in the input pipe.
>>    
>>
>
>The patch reading and drawing binary data is a major speedup. Try to
>compare drawing big (>512x512) traditional gnuplot binary data file and an
>binary image file.
>  
>
>>Input of binary files containing regular arrays may be worth it
>>for convenience.   But more complicated input requiring flags for
>>bit order, word size, floating point format, and pre-announcement
>>of the file structure?
>>

There are all kinds of data files out there.  I guess the question is, 
should the user be obligated to write something to make their files 
conform to gnuplot, or should the syntax exist for the user to finagle 
gnuplot into reading his or her file?  Take the moderately proficient 
linux user, like myself.  I can toil with  a program's syntax to a 
certain extent.  But to write a linux utility to convert data file 
formats, that's more trouble.

But I acknowledge Ethan's point; perhaps a bit of "code bloat".  (Let's 
see if I can help that, see below.)

>>All that strikes me as being more trouble
>>than it is worth.  Will your code work on an Amiga?  On a 64-bit
>>VMS machine?
>>    
>>
>
>I think so. You can specify Float32, Float64 etc.
>
>Probably you cannot draw binary floats saved by Turbo Pascal v 5--6,
>because these are 6 B -- I don't know about recent Pascals. But you cannot
>read in any other image programs I guess.
>  
>
>>Who is going to explain to users how to set
>>all the right flags to make it work?
>>    
>>
>
>Users working with image processing know their format structure. That's
>the user who explains gnuplot what to draw via command line options to
>"plot ... with image".
>Otherwise, you or somebody else writes a reader for Octave, and from there
>you draw your matrix via imagegp.m included in the patch.
>  
>
>>I believe that Petr had some specific applications in mind, so
>>maybe he can step in and clarify exactly what pieces of this
>>code he wanted, and why.
>>    
>>
>
>Yes, I want to fastly image binary image data with axes x and y of
>physical units (not pixel numbers).
>  
>
>>I myself plot many sorts of data in
>>gnuplot, but I've  never felt a need for direct binary input.
>>    
>>
>
>I need it always when drawing an image larger than >=128x128. Otherwise,
>the drawing speed is very low (and especially on X11) and memory
>consumption is high (I remember that gnuplot eats at about 130 B for a
>point read+drawn from a column-wise file).
>The current version of Daniel's patch is fully satisfying my needs.
>

That's correct.  We'd thought the current format with gnuplot's 
eight-field-wide point was sort of inefficient for large data files. 
 However, tacking in a new scheme for more compact storage would be too 
much of a paradigm shift.

In that same vein, I'd like to address that df_3dmatrix() routine again. 
 Now, inside of there is code that looks very similar to the 
df_readbinary() I've created.  So, this df_3dmatrix() started out in 
some ways similar to df_readline(), with the use_specs and all.   And 
Ethan and I have discussed now this problem of code re-use or similar 
functionality for binary and ascii, without intermixing them in the same 
routine to create a mess.  So, that df_3dmatrix() has in some sense 
already has not kept up in functionality to df_readline().

I'd like to propose you let me take a bit of time to move the important 
parts of df_3dmatrix() that already aren't in df_readbinary(), which I 
think are very few, and move them into df_readbinary().  I could easily 
make that df_readbinary() routine read gnuplot binary files().  Then 
df_3dmatrix() and it's helper routine read_file() could be discarded. 
 That would make the innards of plot2d.c and plot3d.c only use the 
"df_readline()" approach to bringing in data.  Perhaps one doesn't like 
the "df_readline()" approach, but I think there is advantage to having 
just one paradigm *and* to having df_readascii() and df_readbinary() in 
the same file where they share their many similarities and it is a good 
reminder that if someone adds functionality to df_readascii() there is 
always that df_readbinary() to be aware of.

Basically df_3dmatrix() and read_matrix() read in the whole data file at 
once, then go through a short loop to store the array into the "point 
structure".  Is that the direction that Gnuplot should head?  If you 
think not, and agree that having only just a "df_readline()" form of 
input is good, then that right there will free up some code space and 
assuage concerns about code bloat.

Dan

Re: Lengthy discussion about datafile.c...

A portable, multi-platform, command-line driven graphing utility

Re: Lengthy discussion about datafile.c...