From: sfeam (E. Merritt) <eam...@gm...> - 2012-09-29 18:44:24
|
On Saturday, 29 September 2012, Dima Kogan wrote: > > On Fri, 28 Sep 2012 10:37:06 -0700 Hi Dima, Just some preliminary thoughts... I quickly replicated your benchmarks and see roughly the same results (faster CPU but limited memory; 32bit environment). However, I have a few general concerns. 1) The output of the ascii version is hugely redundant. Most of the successive V commands are identical because the vectors are shorter than the resolution of the plot coordinates. If this were a common thing we would do well to add a filtering step in x11.trm so that a new "V" command is only sent if it differs from the previous command. For example running the ascii output through "uniq" reduces the file size by a factor of 7, with a speed increase of 50x(!!) running through gnuplot_x11. 2) The binary output is not terminating the "V" records with a '\n'. This is fine on linux, but there is a long history of problem reports on Windows arising from very long buffers sent through a pipe. I don't know the details well enough to say whether this would trigger similar problems, but I do worry. I think it's worth also trying a variant that writes a trailing '\n'. If writing a newline for every V command causes a significant slowdown then perhaps we could generalize the existing code in X11_filled_polygon() that breaks long binary buffers into smaller chunks. 3) I worry that the binary code might not work on all supported platforms. Since it's just a few lines of code, I suggest that rather than replacing the existing 'V' command we add a parallel command 'B' for the bainry version. In x11.trm the choice between using 'V' or 'B' could be a compile-time option. We can default to the binary version, but anyone having problems with it could revert to the old ascii version with a configuration flag. I'll play around with more serious benchmarking if I get time later this weekend. Oh, and I noticed something else about the new aspect-ratio code while running the benchmarks. I use ./gnuplot_x11 -noevents < foo.x11 to benchmark the outboard driver. Since there is no feedback in place, the inboard driver doesn't know about the aspect ratio and the plot is not scaled properly to the plot window. Now this is not the usual path for display x11 output, but I wonder if we can fix that easily. Maybe disabling the rescaling code if the feedback pipe is not present? Or maybe sending an initial set of scaling commands that are always correct, which are later over-ridden as needed in the interactive case but remain in effect for the case of -noevents or no feedback pipe? Ethan > I just ran some benchmarks myself. The results are quite interesting. > > First off, the test machine description: > > gcc (Debian 4.7.0-12) 4.7.0 > > CPU: > vendor_id : GenuineIntel > cpu family : 6 > model : 15 > model name : Intel(R) Core(TM)2 CPU T7400 @ 2.16GHz > stepping : 6 > > This is a 2-core machine, but everything in gnuplot is single-threaded > so this doesn't matter. > > I generated 2 large-ish data files. Both contain an identical sinusoid: > one stores it in ascii, another in binary with packed single-precision > floats. This isn't what we're testing, but I wanted to get this > conversion step into the data as a reference. Commands to generate data: > > $ perl -e 'for(0..2000000) { print pack "f*", $_, sin($_/100000); }' > dat.bin > $ perl -e 'for(0..2000000) { print "$_ " . sin($_/100000) . "\n"; }' > dat.ascii > > > For each gnuplot build I tested, I timed 3 things: > > 1. inboard x11 from binary (with 'terminal xlib') > 2. inboard x11 from ascii (with 'terminal xlib') > 3. outboard x11 (just gnuplot_x11 executable) > > I tested 4 different gnuplot builds: > > 1. before the split-printf patch (uses %04d format) > 2. after the split-printf patch (uses space-separated %d format) > 3. after the split-printf-patch but ALSO sending the V command in > binary. V commands were the bulk of the xlib-generated data stream, so > this is our hotspot. 16-bit integers for each argument of V > 4. Same as previous, but using 32-bit integers > > ASCII input tests were run by making a 'tst.gp' file with > > ================ > set term xlib > set output "out.xlib" > plot "dat.ascii" with lines > ================ > > Then generating timings by running multiple times > $ time ./gnuplot tst.gp > $ time ./gnuplot_x11 < out.xlib > /dev/null > > > Binary input tests were run similarly, but with a 'tst.gp' such as > > ================ > set term xlib > set output "out.xlib" > plot "dat.bin" binary format="%float32%float32" with lines > ================ > > > Results. Each record is usertime,systemtime in seconds. > > | | %04d | %d | %d, 16-bit binary V | %d, 32-bit binary V | > |---------------------+-----------+-----------+---------------------+---------------------| > | inboard from ASCII | 1.91,0.21 | 1.86,0.21 | 1.53,0.19 | 1.53,0.21 | > | inboard from binary | 0.95,0.20 | 0.89,0.20 | 0.56,0.16 | 0.56,0.20 | > | outboard | 0.82,0.11 | 0.85,0.11 | 0.19,0.10 | 0.21,0.11 | > > First off, we see that splitting the fields with whitespace speeds up > the inboard driver a little bit, while slowing down the outboard one a > little bit. Not completely sure why, but the difference is negligible, > so I didn't go digging. > > However, sending the data over in binary produces HUGE performance > gains. The inboard driver is 37% faster (when the original input is > binary too), while the outboard one is a whopping 76% faster. Not quite > sure why this is so uneven; maybe the outboard driver parses the data > more times than it needs to? > > None of this is really surprising, but it really reinforces the earlier > point that seeking performance gains in our ASCII representation is > foolish, leading to minimal speedups while making the code less > manageable. When I started this, I wasn't going to advocate that we > move to a binary data stream, but the speedup is so significant that we > really should, I think. > > I'm attaching a patch that changes the V command to work in binary. > Note that this patch is not at all good-enough to merge yet; I'm > attaching it so that others could run these tests as well. So, should > we move the intensive commands to binary? What commands are these, > other than 'V'? > > dima > |