Menu

Sampling of Data series

2008-10-03
2013-05-28
  • Nobody/Anonymous

    Hi,
       Like every other LiveGraph user I'm also pretty amazed by the tool, it is just what I needed. But as if had everything I need I wouldn't be writing here :)
       So here is my point of view, when I select 'Show all data' in data file setting I expect that there wouldn't be any sampling and every thing in the file would be plotted. My data file contains a large number of items, we have a system processing ~250K events per sec I was trying to plot the time taken by each event against the time of day, which I was able to get in LiveGraph date time format so no ranting there. The reason I'm plotting event latency is to identify any time slots when the system was sluggish and then to investigate what happened there.
       Although sampling algorithm used is very nice and it does a very good job of taking samples from regular intervals, but still when I have an option saying 'Show all data' it should show all data. Sampling should be an option and user should provide maximum sample numbers. Please let me know your thoughts on this.

    Nikhil

     
    • Nobody/Anonymous

      Please read '~250K events per sec' as '~250K events per hour'

       
    • Greg Paperin

      Greg Paperin - 2008-10-03

      Hey Nikhil,

      I think, my take on it is the following:

      Most modern screens are up to 1280 pixels wide. Some wide screens can do 1600 pixels. WLOG, let us assume that including side margins and similar things the effective plot area will be no more than 1500 pixels wide.

      So, if your data has more than 1500 points, they will be sampled anyway, if not by LiveGraph's intelligent cache, then by the drawing routine, only in the latter case the rendering will be much slower and much more memory intensive.

      Currently the cache has a fixed size of 500 data points, so if your plot is 1500 pixels wide, 2 out of 3 points will be interpolated and 1 out of 3 - sampled. Note, however, that pixels on a screens that support such a high resolution will be fairly small, and even if all 1500 points were true data points, the result would not look significantly different to the eye.

      So, as you see, if you have 250k data points, switching off sampling would not make a significant difference.

      You may however rightfully object, that if you zoom into your data (by selecting a smaller visible interval through setting MinX and MaxX), the number of data points rendered at a given time may be small while the sampling is still based on the size of the whole data file, in which case LiveGraph's sampling does indeed lead to a noticeable loss of detail.

      The reason that this loss may be noticeable can be seen from the following example: say you have 300000 data points, since the cache holds 500 points at a time, you have sampled every 600 points. If you now zoom in to view all points between dataset 0 and dataset 10000, you have only 17 points of real data in this range and everything else must be interpolated.

      It would be nice, if LiveGraph would realise that and would dynamically reload the relevant part of the data file to give you better resolution. The problem is, however, is that in parallel to reloading the zoomed data, LiveGraph must continue to continuously load any addition to the end of the data file in real-time. This can be done, if fact we have run some prototype tests on this. The conclusion was that this would work well of new computers, but may lead to performance problems on large files and on older machines. Besides, it would significantly complicate the software, which means that it would take a long time to add this feature.

      However, there is a planned feature that will partially solve the problem, at least. We want to let the user control the size of the cache. Of course, changing the size of the cache-size would require to re-parse the whole data file, but this would need to happen only once and it can be implemented relatively easily.

      Currently, LiveGraph's task priority list is as follows:

      TOP1) Feature stabilisation and bug fixes with the aim to leave the beta status.

      TOP2) Documentation of new features introduced in version 2.

      MID1) Tutorials/examples for V2 features.

      MID2) Persistence for GUI state between run sessions using settings files. Will allow to control initial GUI state when starting from non-Java apps.

      MID3) Allowing the user to change cache size through GUI.

      LOW1) Further develop plug-in capability.

      LOW2) Everything else.

      You see, the feature you want is (at least partially - through changing cache size) is planned. The question is only how much time we can spent on working on the project.

      Hope this helps..

       
      • Thomas Meyer

        Thomas Meyer - 2008-10-03

        If I may interject here....

        There are cases where even with a resolution of 1600 pixels, and 16,000,000 data points it IS important to plot every point, and sampling will give a vastly different view.

        For example, say you using it as a seismograph, and the thing has flatlined for 15,999,999 points and there is one non-zero event somewhere in the middle . If you sample, it's 10,000 to one that you catch the earthquake.  But if you don't sample, you'll effectively be plotting 10,000 zeros at every pixel, but  at the time of the earthquake you'll plot the one, so you will always be able to see it on the graph.  I hope this is as clear in writng as it is in my head.

         
        • Greg Paperin

          Greg Paperin - 2008-10-04

          Hey,

          your example makes good sense, but it is very difficult to handle in a generic sense:

          There are two ways to render data (assume there is no cache for the moment, we are just talking about rendering):

          1) Map Data to Screen. [This is what you are suggesting] Go through each data point, map it to the corresponding screen coordinates and render. There are two problems with that. (A) If you have 1 million data points this will need 1 million steps. As there are typically more data points than screen columns, this approach will be very slow. (B) While in your special case this approach will make sure that the short surge is visible, in more typical cases, you will end up with many data points being rendered on top of each other, resulting in an extremely convoluted plot where details are unrecognisable (unless they are very sparse like in your example).

          2) Map Screen To Data. [This is what graphic engines typically to once their scene graph is calculated] Go through each screen point, project a ray from the viewpoint through each screen point onto the world, calculate the closest intersection with the world graph, render. This basically corresponds to sampling the world at screen resolution and is most likely to miss your short spike.

          Actually, LiveGraph follows approach (1), but only because we can make the assumption that the scene is very sparse (i.e. that the plot has more background than non-background pixels). This can work efficiently because we know that there is a sampling-cache between the full data and the screen data.

          The fact around which everything must be build is that if the current view is W pixels wide while showing the data that contains L data points, and W < L, then in one way or another, W points will have to be sampled /interpolated from L. This is a fact, the question is just what is the best way to do that.

          The current way to do that in LiveGraph is a compromise between
          - plot quality,
          - generic applicability,
          - program simplicity,
          - update speed in a real-time environment.

          I agree that the cache size of 500 is chosen with consideration of typical plot sizes, but may not meet all needs. This is why we want to give the user the possibility to change the cache size. However, if you can think of another way that will still present a good compromise between the above objectives, we would definitely look into that.

           
    • Thomas Meyer

      Thomas Meyer - 2008-10-06

      Sorry if this is getting annoying, but I would like to bring up a suggestion for dealing with L >> W, which you may already have thought about and rejected.  If that's the case, I apologize.  But here goes. 

      In financial charting, there are things called "open high low close bars", or OHLC bars.  The way they work is that they plot a horizontal tick at the first price within a bar, then a vertical line going from the maximim to the minimum during the bar, then a tick at the last price within a bar. 

      When L >> W, you would bucket the data into "bars", show the first point in a bucket, draw a VERTICAL line between the max and min for all points in the bucket, then show the last point.  Move over a pixel, then repeat.  Also, when real estate is an a premium and when there are lots of points per bar, the open and close points aren't that important, and you can simply plot W vertical lines between the max and min of the data, grouped by L/W points per vertical line.  Whenever L is significantly greater than W, this will look about the same as the way LiveGraph looks now, but will have the added bonus of never missing outlier events.  Of course the trade-off is you have to read all zillion points, which may be a deal killer if it slows things down too much.  One of the many great features of liveGraph is how impressively fast it runs.

       
      • Greg Paperin

        Greg Paperin - 2008-10-06

        The key feature behind LiveGraph's smart cache algorithm is that is can perform re-sampling without re-reading the data file. This is why it's fast. I guess, something similar to what you describe may be possible while holding this constraint. We could show, say, the min/max values of the sampled region...

        We would have to do some tests to see how the results looks like in a general case - we must make sure that the graph does not get too convoluted. Then we can make an estimate of how much work this is and whether we want to include this functionality and in which version.

        Can you point to some examples of these OHLC graphs, so I can get a feel? I'll then discuss it with the team.

        Cheers..

         
    • Thomas Meyer

      Thomas Meyer - 2008-10-06

      Thank you for the attention you are giving this topic.   If you don't think it will make liveGraph a better product, please don't feel any obligation to pursue this at all.   Here are someYahoo graphs showing the OHLC bars.

      http://finance.yahoo.com/q/bc?s=DIA&t=3m&l=off&z=m&q=b&c=
      http://finance.yahoo.com/q/bc?s=DIA&t=5y&l=off&z=m&q=b&c=

      To my mind, a lot or real estate gets wasted showing the open and close ticks for each bar. 
      I think I might prefer to leave those out and graph all the vertical bars on adjacnet pixels.

       
      • Greg Paperin

        Greg Paperin - 2008-10-13

        Ok, thanks, we'll be looking into it and I'll post the result here.
        But please do not expect this to find its way into the system too soon - our priorities are as posted before.

        Nevertheless - great thanks for a very interesting suggestion!

         
  • min_hero

    min_hero - 2010-01-07

    You can update plotted data from file,
    when dontcache option is selected.
    This design choise is completely logical and according to LiveGraph's designer's approach.
    The source code change is somewhat simple, i will contribute this as soon as i find how to :).

     

Log in to post a comment.