From: Daniel J S. <dan...@ie...> - 2004-08-16 17:04:05
|
mi...@ph... wrote: >>>Well, I'd not thought of strings in the binary file, but perhaps >>>something like "%s" in the format string? I would probably make the >>>restriction that the strings within the file need to be NULL >>>terminated. >>>I'm not sure what you mean by a matrix of strings. >>> >>> > >Isn't "binary file with strings" the same as "ascii file with strings", >just with \0 instead of \n -- thus, "tr" filter would do all. Aha, that >won't help if binary data and strings are mixed in one file. Do you mean >this case? > Yes, that case. But I think they still are the same. If you look at binary files with headers in an editor, the strings near the top are just as readable as if it were an ascii file. >>>The problem is that an image of say 500 x 500 pixels gets very big in >>>ASCII. >>> >>> >>I think that is a non-issue. You don't have to store this anywhere; >>you're just piping it in. >> >> > >Piping is not as fast as direct read. > >For example, I have recently benchmarked reading a big .gz files by a C >program with (1) popen("gzip -c -d"), and with (2) linking it with zlib. >The case (2) was 15% faster -- quite interesting speed-up if you have to >read 2 GB of data. > > >>When disks got cheap we all heaved a huge sigh of >>relief and for the most part stopped using binary output files. >> >> > >I don't think this is right. > >- You can work on a computer where you are not authorized to replace hard >disk (case of many companies). > Good point. That sort of thing has been in U.S. news lately, i.e., misplaced drives that shouldn't be misplaced. > Also notebook hard disks are not cheap, >fast, and easily replaceable. >- Digital detectors are improving, and nowadays I have to deal with >2048x2048x16bit image series. That's plenty of data -- it grows >quadratically with improvements in detector technology. >- I guess image processing will never switch to ascii data -- it will >always be too big and slow. > Petr has it exactly right with image processing. As computers get faster, the technology seems to fill the void. That is a point I wanted to make before. (Call me the "neo-geezer".) >>Many users may not be >>up to this, but those same users won't be able to figure out the >>endian business anyhow. >> We've included an option "swap" for which a person doesn't need to know what big/little endian mean. Just swap the order and see how it turns out. >In the "with image" patch, parameters and thus command line options for >reading binary (matrix) files were designed carefully so that you can read >any type of data. Command line options cover the same range of options you >have to fill for any, even GUI-like, binary image reader, to read an >arbitrary image data. The user must always know its data, that's it, and >he is not bother that he has to pass this information to gnuplot or >whichever other image drawer. > > >>Simplicity is worth *a lot*. Far more than saving a little bandwidth >>in the input pipe. >> >> > >The patch reading and drawing binary data is a major speedup. Try to >compare drawing big (>512x512) traditional gnuplot binary data file and an >binary image file. > > >>Input of binary files containing regular arrays may be worth it >>for convenience. But more complicated input requiring flags for >>bit order, word size, floating point format, and pre-announcement >>of the file structure? >> There are all kinds of data files out there. I guess the question is, should the user be obligated to write something to make their files conform to gnuplot, or should the syntax exist for the user to finagle gnuplot into reading his or her file? Take the moderately proficient linux user, like myself. I can toil with a program's syntax to a certain extent. But to write a linux utility to convert data file formats, that's more trouble. But I acknowledge Ethan's point; perhaps a bit of "code bloat". (Let's see if I can help that, see below.) >>All that strikes me as being more trouble >>than it is worth. Will your code work on an Amiga? On a 64-bit >>VMS machine? >> >> > >I think so. You can specify Float32, Float64 etc. > >Probably you cannot draw binary floats saved by Turbo Pascal v 5--6, >because these are 6 B -- I don't know about recent Pascals. But you cannot >read in any other image programs I guess. > > >>Who is going to explain to users how to set >>all the right flags to make it work? >> >> > >Users working with image processing know their format structure. That's >the user who explains gnuplot what to draw via command line options to >"plot ... with image". >Otherwise, you or somebody else writes a reader for Octave, and from there >you draw your matrix via imagegp.m included in the patch. > > >>I believe that Petr had some specific applications in mind, so >>maybe he can step in and clarify exactly what pieces of this >>code he wanted, and why. >> >> > >Yes, I want to fastly image binary image data with axes x and y of >physical units (not pixel numbers). > > >>I myself plot many sorts of data in >>gnuplot, but I've never felt a need for direct binary input. >> >> > >I need it always when drawing an image larger than >=128x128. Otherwise, >the drawing speed is very low (and especially on X11) and memory >consumption is high (I remember that gnuplot eats at about 130 B for a >point read+drawn from a column-wise file). >The current version of Daniel's patch is fully satisfying my needs. > That's correct. We'd thought the current format with gnuplot's eight-field-wide point was sort of inefficient for large data files. However, tacking in a new scheme for more compact storage would be too much of a paradigm shift. In that same vein, I'd like to address that df_3dmatrix() routine again. Now, inside of there is code that looks very similar to the df_readbinary() I've created. So, this df_3dmatrix() started out in some ways similar to df_readline(), with the use_specs and all. And Ethan and I have discussed now this problem of code re-use or similar functionality for binary and ascii, without intermixing them in the same routine to create a mess. So, that df_3dmatrix() has in some sense already has not kept up in functionality to df_readline(). I'd like to propose you let me take a bit of time to move the important parts of df_3dmatrix() that already aren't in df_readbinary(), which I think are very few, and move them into df_readbinary(). I could easily make that df_readbinary() routine read gnuplot binary files(). Then df_3dmatrix() and it's helper routine read_file() could be discarded. That would make the innards of plot2d.c and plot3d.c only use the "df_readline()" approach to bringing in data. Perhaps one doesn't like the "df_readline()" approach, but I think there is advantage to having just one paradigm *and* to having df_readascii() and df_readbinary() in the same file where they share their many similarities and it is a good reminder that if someone adds functionality to df_readascii() there is always that df_readbinary() to be aware of. Basically df_3dmatrix() and read_matrix() read in the whole data file at once, then go through a short loop to store the array into the "point structure". Is that the direction that Gnuplot should head? If you think not, and agree that having only just a "df_readline()" form of input is good, then that right there will free up some code space and assuage concerns about code bloat. Dan |