From: Daniel J S. <dan...@ie...> - 2004-08-15 20:23:44
|
Ethan Merritt wrote: >(wandering a bit off topic) > >On Sunday 15 August 2004 01:03 am, Daniel J Sebald wrote: > > >>My geezerness only goes back to the days of PDP 11-70 and the 8 inch >>floppy platter. But I can't recall binary files ever having special >>characters to serve as the end of a record. >> >> > >PDP 11/xx used the FILES-11 filesystem, in which meta-information >about record type, disk allocation, ACLs, etc were stored in a >separate meta-file, not as in-line info. > >These filesystems supported very complicated record structures >for database work (still in use today), but also had 3 main "simple" >file structures: > Fixed-length records: > What it sounds like. The record length was specified in > meta-data. A read operation returned 1 whole record. > Variable-length records: > Each record began with an integer specifying how long > the record was. > CR/LF: > Unix-like stream-of-bytes, with end of record signalled by > either a CR or a LF. > >On top of that, Fortran used carriage-control characters at the beginning >of a record. > OK, you win the geezer challenge... Anyway, gnuplot binary then is similar to variable-length record. >But it's not a file. It never hits the disk, so I/O speed is not an issue. >And at current memory bandwidths, transferring 10 MB of data should >take only about 0.01 sec (if I haven't dropped a decimal point somewhere). >That will be totally dominated by the I/O time to read the original binary >data from a disk file. So it may be unaesthetic to have an intermediate >ascii stream, but I doubt it will be noticeable in terms of interactive >response. > Here is a test. Lets say a 500 x 500 image is processed in Octave and is to be plotted. I don't think 500 x 500 is unreasonable, x-ray angiography images, telescopic space images, they're usually pretty big. If you have octave, try the following to simulate the amount of data that would be transferred through the pipe. (Granted, we have no idea what kind of bottle necks might exist in how Octave is programmed for the pipe--perhaps it could be improved--but we'll use this as a rough test.) t = [1:500*500]/100; s = sin(t); plot(t,s); On my machine, a three year old Dell with a Pentium 4, 900-1000 MHz system bus, that plot takes 8 seconds. After the 4th second the octave command line returns, and 4 seconds after that the gnuplot plot appears. To me, that time is unacceptable. (Imagine the derision... No, the solution is not to buy a faster computer.) There are probably a couple things going on. First, the pipe may not transfer data at the rate you suggest, due to time sharing perhaps. Who knows? Second there is also the issue of this being formatted I/O, meaning that every value has to go through the scanf function. Does that slow things down? Now an example in Octave using the m-file designed to use the image and binary features added to gnuplot. A = 1./hilb(500); imagegp(A); This takes 3/4 to 1 second. Tolerable. There is a difference here though. The binary data goes through a file. So maybe the file is faster than the pipe. Let's try one last test. Sending the image data to a file in ascii form. I'll put an "if 1" around the instructions to ensure they are all executed as fast as possible one after the other. X = ones(size(A,2),1)* [1:size(A,1)]; Y = [1:size(A,2)]'*ones(1,size(A,1)); N = size(A,1)*size(A,2); B = [reshape(X,N,1) reshape(Y,N,1) reshape(A,N,1)]'; if 1 fid = fopen("junk.dat","w"); fprintf(fid, "%f %f %f\n", B); fclose(fid); graw("plot \'junk.dat\' using 1:2:3 w image\n"); end This takes 6 or 7 seconds. So files and a pipe are roughly the same in this crude test. Perhaps the file is even faster because more data is being transferred in that case. However, there are other things within gnuplot, i.e., reading from a file and reading from '-' may be different. Anyway, rough test. But, the conclusion is that it is probably the "fprintf' and 'scanf', i.e., formatted I/O, that slows things down, and binary data is a nice feature to have with images. >>I would add that I myself am deterred from implementing general binary >>if the df_readline() is going to continue to grow with functionality >>from within. Unless, say, the use_spec processing is converted to a >>function that can be called from multiple places, trying to maintain two >>"analogous", or "parallel", routines is too much for anyone, whether he >>or she is the original author or not. >> >> > >You mean changing use_spec[] from an array into a function? >If that turns out to be useful then I suppose it would be reasonable. > Yeah, but I'm not advocating that. You are persuading me that perhaps "binary" should be simpler. The question is, how many people will use Gnuplot, from the command line for processing images. Not many; so I would say that passing data through a function isn't that necessary, as in this example plot 'blutux.rgb' binary array=128x128 flipy format='%uchar' using (1.5*$1):2:3 with rgbimage The primary use I have in mind for this "large data set plotting" is something done by an application in an ephemeral way. Just send some data over, plot it, and discard the data. So, perhaps the ability to skip data within a binary file isn't necessary. That is, no '%*uchar%' kind of stuff, or skipping a number of bytes at the head of the file. How about tossing out the multiple records per file feature. If there is more than one big data set to plot, just create multiple files. How about tossing the implicit sampling interval? That would mean that all data must appear in the file, for example the (x,y) coordinates for each pixel of an image must be along with the pixel value. That means a sample image for the 'image.dem' program would increase in size by a factor of 5/3. No problem. Translations, toss that in the case where coordinates are in the file. All of this stuff would reduce a lot of the code, much of which is for interpreting the keywords. With no "using" there can be no functions. Also, let's say with binary, no strings, no time data, etc. Again, this kind of stuff will be small in quantity if ever it is plotted, in which case ASCII can be used. What I mean is there is no need to plot 500 strings. I'd hesitate to toss '%uchar', etc. Although I could give on that one. But let's rule out multiple data types per file. Maybe just one %float, etc. inside the format string. The code that does the transformation inside the df_readbinary() routine is fairly straightforward. There is a set of tables to compute datasizes upon compilation. Looks nasty but once it is compiled, it probably isn't too big. I'd hesitate to toss the endian information too. That code inside df_readbinary() also isn't too bad. The thing is, octave has a qualifier associated with its fopen() routine "ieee-le" and "ieee-be". They pay attention to endianess, so maybe gnuplot binary should too. So, in order to get functionality, here is a possible reduced syntax. binary {3 | xy | xyz | xyzc} {format="string"} {endian=little} Now if we want to toss the format, and require "all floats, all the time", fine. But the first part of that syntax is to allow entry for both images and long linear records such as speech waveforms or whatever. binary : The current gpbin file binary 3 : Very similar to current gpbin, what I call gpbin3. That is, it is the matrix format, but each element of the matrix has 3 components. (Could make that an arbitrary number, 1 up to max columns.) Now that covers images, i.e, a matrix format. But what about sampling in one dimension? Perhaps that could be done with gpbin if one sets N (the number of columns and first number in the file) to one. But that is tricky from the user's perspective. Hence the following: binary xy : Two "columns" of data. Would be useful for 2D plots. binary xyz : Three "columns" of data. Would be useful for 3D plots. binary xyzc : Four "columns" of data. Would be useful for 3D plot doing color. This wouldn't have to be the exact syntax. For example, it would be nice if one could just specify the number of columns with 2, 3, 4, 5, ..., max_cols, but that would conflict with trying to introduce multiple components per element of matrix binary. Dan |