Dimitrios Apostolou wrote:
> Hans-Bernhard Broeker wrote:
>> Dimitrios Apostolou wrote:
> I like gnuplot so I tried it. For this kind of data I really like the
> "map" capability of gnuplot. It would be interesting if the
> "convenience" features worked faster.
Interesting, yes. But we're already stretched somewhat thin on
man-power as it is. Concerning ourselves with the efficiency of side
aspects that other tools are already a lot better at than gnuplot can
possibly be, would be a waste of effort.
Note that I'm not trying to tell you not to use gnuplot at all --- I'm
trying to convey the message that you should use more tools than only
gnuplot. gnuplot is good for plotting, but mediocre at mass data
processing. So use better tools for that part of the job, then come
back to gnuplot with the actualy plottable data.
>> It's not *that* much more, actually. A double-precision variable
>> takes 8 bytes, that's about three times as much as your ASCII data.
>> Add the implied x and y variables missing in your matrix file and
>> you're at
>> 6000*6000*3*8 Bytes = 864 MB of data. gnuplot will use even more than
>
>
> Is *3 necessary for this kind of data (matrix)?
For reasons of program structure and internal efficiency, all the
various kinds of input data have to end up in the *same* data structure,
regardless of whether they were topologically and geometrically very
limited matrix data, generic grid-topology data, or a point cloud
without any kind of structure.
It's not strictly necessary to do it that way, but for most reasonable
plots, this organization works well.
>> that, and that's a problem. But the real problem here is that a
>> 6000x6000 points data set is essentially unplottable --- no output
>> device you're likely to be using has enough resolution to display all
>> those points in a readable way.
>
>
> IMHO the more points we have the better looks the map or the surface
> mesh we plot.
That assumption is fatally flawed --- as soon as you have as many input
points than the output medium has pixels, adding more is guaranteed to
make the plot not better, but will actually render it increasingly
unreadable. 6000x6000 is well beyond that point: on a screen, you'll be
trying to display at least 36 data points in every pixel of your plot
--- that's not adding quality, that's adding confusion.
>> That's because gnuplot parses all data points, regardless of whether
> Is it really necessary? Why not parse only the needed points?
Necessity is not the issue --- convenience of maintaining the overall
structure of a very ancient code base is. The code has always been
organized to read all data, then let the "every" filter decide which
points to actually use. Changing that would be difficult, to say the
least. Features like the fact that 'using' can be applied even to
'matrix' data may well rely on such details.
>> they'll be used or not --- and, like it or not, scanning ASCII
>> representations of (presumably) floating-point numbers is *slow*.
> I know of the overhead "parsing" implies, however I know that an 800 Mhz
> CPU ought to do it much faster.
Probably. I just ran a little experiment, and found that 36000000
double-precision numbers could be scanf()ed in about 80 seconds process
CPU time, on a 650 MHz PIII.
It may be worthwile to profile your gnuplot in (a smaller version of)
this case, to see where the time is actually spent. Possibly, it's pure
memory access time --- at this kind of size, memory bandwidth becomes a
serious bottleneck, too.
> Don't you agree that gnuplot's
> implementation is highly inefficient on this?
ASCII datafiles are inefficient by design, but at the same time, this
inefficiency allows them to be understood by humans, and keeps them 100%
portable across machine architectures. Two sides of the same medal.
|