Menu

#280 Creating timeseries histogram performs very poorly

None
closed
nobody
None
3
2024-09-26
2023-01-12
No

Hello there, I have been using a self designed gnuplot timeseries chart for the past 10 years. I created it myself as I couldn't find an adequate feature in gnuplot. When the number of instances in the plot reaches significant values, like 137 (interesting!!), performance gets very poor. In my case (files attached) above 4 mins on a 64 GB MacBook Pro with the new M1 processor and the newest operating system release and the newest gnuplot from homebrew (gnuplot 5.4 patchlevel 5). On Ubuntu 20 with Linux 5.4.0 and gnuplot 5.2.2 it's actually 5 times faster, which would be just about bearable, but I believe it can be made much faster. I have 2 questions, (1) can you help me improve performance (I spent long hours trying to achieve this myself), and (2), can you add such a timeseries histogram in a gnuplot generic fashion because I think it's really useful and the graphs look really nice. To create the png-file, just type "gnuplot <gpl-file>". Thanks a lot.</gpl-file>

3 Attachments

Related

Support Requests: #280

Discussion

  • Ethan Merritt

    Ethan Merritt - 2023-01-12

    Comments

    • Using set term pngcairo rather than set term png is hugely faster on my linux machine. Your original script took 4 minutes to run. Changing only the terminal command sped that up to 13 seconds. Clearly there's something wrong with the png terminal (or more likely with the gd library it uses) but I will defer investigation to another time.
    • Why multiplot? There is only one plot
    • Since the data is in a csv file, you should tell gnuplot set datafile separator comma
    • Writing title "" rather than notitle may create empty space in the key
    • At this level of resolution, drawing individual boxes is probably wasted effort (effort==time). I think it would be sufficient to plot with impulses. On my machine this further reduces the run time from 13 seconds to 8 seconds.
    • I don't know if you are interested in simplifying the script itself, but if you were to put your color choices in a palette you could reduce that plot command to plot for [col=2:*] timeseries using 1:col:(col) lc palette notitle. Of course then you'd have to deal with the titles separately. Doing that shaves another 2 seconds off the run time (6 seconds total, but my test run has no labels).
    • So long as the y coordinates of your data are strictly increasing across each row (so you don't lose some to overlap with printing with boxes or impulses) I don't see what you would gain by using a stacked histogram. Were you thinking it would be faster? I don't really see why it would be.

    Bottom line 241 seconds -> 8 seconds

     

    Last edit: Ethan Merritt 2023-01-12
    • Werner Lippert

      Werner Lippert - 2023-01-13

      Good morning Ethan,

      And thank you very much for your quick answer I was hoping to get but didn’t expect.

      I totally agree that the gd library is at the ieqart of the issue, I think it started when they changed the license and gnuplot probably uses something else since. But can’t quite remember anymore.

      Anyway, replacing png with pngcairo brought down runtime to 4 sec (!!). At first I simply couldn’t believe my eyes, but that is the performance gain I thought was possible when I wrote “much faster”.

      And I don’t see any difference in quality when I compare png and pngcairo. Again I think I tried pngcairo before when it was first offered by gnuplot but the result at the time didn’t satisfy my expectations.

      The syntax of my gpl files may look a bit suboptimal here and there, but they are created fully automatically.

      I will certainly try the other recommendations, but pngcairo clearly solves the performance issue.

      I have been creating literally millions of graphs using gnuplot and I really enjoy the quality of the graphs. So I would be happy to donate some amount to support your excellent work and encourage you to continue.

      Have a great day!!

      Cheers,
      Werner


      Dr. Werner Lippert

      Partner
      peaq GmbH Mobile +41 79 218 84 26
      Neugutstrasse 12 werner.lippert@peaq.ch
      CH-8304 Wallisellen www.peaq.ch


      Get the most out of your Hitachi Storage Systems
      With peaq IOportal, SAM4H, Crosscharging and Lifecycle Services

      On 12 Jan 2023, at 22:58, Ethan Merritt sfeam@users.sourceforge.net wrote:

      Comments

      • Using set term pngcairo rather than set term png is hugely faster on my linux machine. Your original script took 4 minutes to run. Changing only the terminal command sped that up to 13 seconds. Clearly there's something wrong with the png terminal (or more likely with the gd library it uses) but I will defer investigation to another time.
      • Why multiplot? There is only one plot
      • Since the data is in a csv file, you should tell gnuplot set datafile separator comma
      • Writing title "" rather than notitle may create empty space in the key
      • At this level of resolution, drawing individual boxes is probably wasted effort (effort==time). I think it would be sufficient to plot with impulses. On my machine this further reduces the run time from 13 seconds to 8 seconds.
      • I don't know if you are interested in simplifying the script itself, but if you were to put your color choices in a palette you could reduce that plot command to plot for [col=2:*] timeseries using 1:col:(col) lc palette notitle. Of course then you'd have to deal with the titles separately. Doing that shaves another 2 seconds off the run time (6 seconds total my test run has no labels).
      • So long as the y coordinates of your data are strictly increasing across each row (so you don't lose some to overlap with printing with boxes or impulses) I don't see what you would gain by using a stacked histogram. Were you thinking it would be faster? I don't really see why it would be.

      Bottom line 241 seconds -> 8 seconds


      [support-requests:#280]https://sourceforge.net/p/gnuplot/support-requests/280/ Creating timeseries histogram performs very poorly

      Status: open
      Group:
      Created: Thu Jan 12, 2023 05:01 PM UTC by Werner Lippert
      Last Updated: Thu Jan 12, 2023 05:01 PM UTC
      Owner: nobody
      Attachments:

      Hello there, I have been using a self designed gnuplot timeseries chart for the past 10 years. I created it myself as I couldn't find an adequate feature in gnuplot. When the number of instances in the plot reaches significant values, like 137 (interesting!!), performance gets very poor. In my case (files attached) above 4 mins on a 64 GB MacBook Pro with the new M1 processor and the newest operating system release and the newest gnuplot from homebrew (gnuplot 5.4 patchlevel 5). On Ubuntu 20 with Linux 5.4.0 and gnuplot 5.2.2 it's actually 5 times faster, which would be just about bearable, but I believe it can be made much faster. I have 2 questions, (1) can you help me improve performance (I spent long hours trying to achieve this myself), and (2), can you add such a timeseries histogram in a gnuplot generic fashion because I think it's really useful and the graphs look really nice. To create the png-file, just type "gnuplot <gpl-file>". Thanks a lot.</gpl-file>


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/gnuplot/support-requests/280/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       

      Related

      Support Requests: #280

  • Ethan Merritt

    Ethan Merritt - 2023-01-12

    Followup

    It seems the performance killer in the libgd png terminal is lw 2.
    replacing lw 2 with lw 1 everywhere in your original script reduces the run time from 241 seconds to 8 seconds.

    And with that clue in hand, I suddenly realize that you really should not be using a linewidth here at all. Instead the fillstyle should be fs solid noborder. Replacing lw 2 fs solid with fs solid noborder in your original script brings the run time down to 6 seconds!!

     
    • Werner Lippert

      Werner Lippert - 2023-01-13

      Yep, 2.7 sec here (have to use decimal digits now to make results distinguishable). I have to check plots with less timestamps to see whether it works in all cases.

      Using "with boxes fill solid noborder” brings it down to 2.3 sec - even still using png - awesome!

      Thanks again!!


      Dr. Werner Lippert

      Partner
      peaq GmbH Mobile +41 79 218 84 26
      Neugutstrasse 12 werner.lippert@peaq.ch
      CH-8304 Wallisellen www.peaq.ch


      Get the most out of your Hitachi Storage Systems
      With peaq IOportal, SAM4H, Crosscharging and Lifecycle Services

      On 13 Jan 2023, at 00:08, Ethan Merritt sfeam@users.sourceforge.net wrote:

      Followup

      It seems the performance killer in the libgd png terminal is lw 2.
      replacing lw 2 with lw 1 everywhere in your original script reduces the run time from 241 seconds to 8 seconds.

      And with that clue in hand, I suddenly realize that you really should not be using a linewidth here at all. Instead the fillstyle should be fs solid noborder. Replacing lw 2 fs solid with fs solid noborder in your original script brings the run time down to 6 seconds!!


      [support-requests:#280]https://sourceforge.net/p/gnuplot/support-requests/280/ Creating timeseries histogram performs very poorly

      Status: open
      Group:
      Created: Thu Jan 12, 2023 05:01 PM UTC by Werner Lippert
      Last Updated: Thu Jan 12, 2023 09:58 PM UTC
      Owner: nobody
      Attachments:

      Hello there, I have been using a self designed gnuplot timeseries chart for the past 10 years. I created it myself as I couldn't find an adequate feature in gnuplot. When the number of instances in the plot reaches significant values, like 137 (interesting!!), performance gets very poor. In my case (files attached) above 4 mins on a 64 GB MacBook Pro with the new M1 processor and the newest operating system release and the newest gnuplot from homebrew (gnuplot 5.4 patchlevel 5). On Ubuntu 20 with Linux 5.4.0 and gnuplot 5.2.2 it's actually 5 times faster, which would be just about bearable, but I believe it can be made much faster. I have 2 questions, (1) can you help me improve performance (I spent long hours trying to achieve this myself), and (2), can you add such a timeseries histogram in a gnuplot generic fashion because I think it's really useful and the graphs look really nice. To create the png-file, just type "gnuplot <gpl-file>". Thanks a lot.</gpl-file>


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/gnuplot/support-requests/280/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       

      Related

      Support Requests: #280

  • Ethan Merritt

    Ethan Merritt - 2024-09-26
    • status: open --> closed
    • Group: -->
     

Log in to post a comment.