Menu

#297 Aggregate time series creates spurious data

open
None
8
2009-08-19
2008-08-12
No

From Deborah:

"I have what I think is a well-documented bug in the attached excel spreadsheet.
It has to do with importing time-series data, aggregating it against itself so that i can be graphed through the "aggregated timelines" list, and finding that it inserts extra rows at inappropriate times, with made up or blank data values. In the example data provided, all the times are hourly in the input data. In the output, where I exported the aggregated version of the data series, after aggregating it against itself as the time series, it has all the right data plus 5 extra rows, in each of which the time is at 36 minutes past the hour. The origin of the data values next to these "extra" points is unknown.
I know that Seiya had a similar problem with other data, and we ascribed it to some post-processing we were doing, but in this case, I haven't done that post-processing, so I think Enchilada is the source. He also saw rogue points at 36 minutes past some hours. That seems to be consistent. But he was using this same database (I have a copy from him now), so I guess the 36 minute aspect of it could in some way be database dependent?

Just to add to this one: I was doing different aggregation of the same data earlier, to make new figures, and when I chopped it off well before the end of the data series, it would give me two rogue points (also at 36 minutes past the hour) at the end, both of which were 7 days past the last date I asked for. So it's not only times within the range requested (in this case I wasn't aggregating it against itself, but inputting times myself)."

Deborah's spreadsheet is attached.

Discussion

  • Dave Musicant

    Dave Musicant - 2008-08-12

    Sample of bug

     
  • Dave Musicant

    Dave Musicant - 2008-08-12
    • priority: 5 --> 8
     
  • Dave Musicant

    Dave Musicant - 2008-08-13

    Logged In: YES
    user_id=794974
    Originator: YES

    More from Deborah:

    Here's some more odd and interesting behavior we have noted in aggregating data, which can be added to the weirdness I reported before:
    1) If the time interval is set to 24:02:00, it outputs the data with the time interval of 00:02:00, presumably just ignoring the 24 hour part.
    2) If the time interval is set to 23:59:00, it outputs the data with the time interval of 11:59:00.

    In both cases, the goal was to get ~24 hours of data aggregated together, but not exactly 24 hours, as we're trying to match the filter data, which is somewhat random times of around 24 hours, but always a little more or a little less, and not starting on regular times. We couldn't figure out how to do it. Probably the best thing to do, ugly as it is, will be to output 1-minute data, remove all the random times at 36 minutes (but will they be real??), and then add up the minutes that are within the filter time intervals, but that's kind of clunky.
    Anyway, please add this to the collection of weird timeline stuff.

     
  • Dave Musicant

    Dave Musicant - 2008-08-25
    • assigned_to: nobody --> robatlas
     
  • Dave Musicant

    Dave Musicant - 2008-10-27
    • assigned_to: robatlas --> nobody
     
  • Dave Musicant

    Dave Musicant - 2009-01-09
    • priority: 8 --> 9
     
  • Dave Musicant

    Dave Musicant - 2009-01-19
    • assigned_to: nobody --> jtbigwoo
     
  • Tom Bigwood

    Tom Bigwood - 2009-02-13

    Email exchange from Tom Bigwood and Deborah:

    Deb, I'm having trouble with 2047856 - Aggregate time series creates
    spurious data (the one that add the extra data at 36 minutes after the
    hour when aggregating a time series against itself.) I got it to happen
    once, but I can't get it to happen again. Can you tell me the specific
    steps I should take and values I should choose? Should I import the
    collection twice to aggregate it against itself?

    Thanks,
    Tom

    I am also having trouble getting this to happen today.
    Here is what I did before, and what I just tried to do again without getting the error:

    I imported a time-series dataset (only once). It had hourly data over a period of many days, and a single value for the data.

    I wanted to try to graph it in Enchilada, so that means it had to be aggregated, so I selected it, clicked aggregate, and then chose itself from the "Time Basis | Match To" pulldown menu.

    It ran and added these extra rows. Today, it is not doing it. I was working mostly with the older version of Enchilada when I had this problem, but the fact that you got it to happen even once means it's still hiding in there somewhere. I am not sure how else I can be of help right now, but if you have other things I should check please let me know.
    Deborah

     
  • Tom Bigwood

    Tom Bigwood - 2009-08-19
    • priority: 9 --> 8
     

Log in to post a comment.

MongoDB Logo MongoDB