Menu

#620 Display axis size and total hypercube size in data selection tab

Babbage
open
nobody
None
3ReallyUrgent
2024-08-28
2024-08-11
Steve Keen
No

Add an “analyse and load” button to our current “load” button on importing. We could then:

(1) identify duplicate columns: things like country code and country will always have the same entries on each row;
(2) identify superfluous columns: things like the code “Q” and the word “Quarterly” would have only one entry in an entire file.

How we proceed after that has some options. Any dimension with only one entry should obviously be set to ignore; maybe we should also either not show it in the import form, or grey it out so it can’t be selected.

One way to handle duplicated columns would be to colour code identical ones, and/or have a pop up window warn a user when they code “axis” or “data” for two identical columns.

The object is to make sure that as many users as possible can successfully load data into Ravel. We are unlikely to hear from those who try it once or twice, and fail to load the data.

Discussion

  • High Performance Coder

    Any dimension with only one entry should obviously be set to ignore;

    This happens now - any axis with a single slice will be removed from the hypercube on loading anyway, regardless of whether the user set it to ignore or axis. Those are not the problem.

     
  • High Performance Coder

    As I mentioned in my email response, I'm not convinced this is such a good idea.
    I'm inclined to think the solution will have more to do with database backing, where one can extract the slice or rollup desired using the Ravel widget, without having to worry about the 64 bit hypercube limit.

     
  • Steve Keen

    Steve Keen - 2024-08-12

    OK, but we do need to warn users of when they are likely to breach "curse of dimensionality" limits. Could you include a calculation field showing the data requirements of current selections? This could be used to advise users to reduce dimensionality on importing by choosing to ignore sufficient columns while still generating a unique key.

     
  • High Performance Coder

    It's a bit more doable. We do, of course calculate the hypercube size and show an error message when the 64 bit limit is breached. Now that we're reading the whole file in during the metadata specification stage, we can update a display of the hypercube size, and display it in red when it exceeds 2^64.

     
    • Steve Keen

      Steve Keen - 2024-08-12

      Good. Even a minimum size warning would help, since 2^64 is a much bigger
      number than most people appreciate. Something that gave users an indication
      of potential data cube size? Maybe a box like this:
      Dimensions:
      Max number: 21 (this is the initial size of the BIS database)
      Est. minimum for unique key: (user entered--7 for the BIS)
      Max values per dimension: 350 (that's roughly the number of quarters)
      Min values per dimension: 2 (Adjusted or unadjusted for breaks)
      Estimated data cube size (calculated by Ravel):
      Max Dimensions:
      Min Dimensions:

      The size difference between the max and min should be enough to let users
      know that they should select for the smallest possible number of dimensions.

       
  • High Performance Coder

    The idea I think we've agreed to here is to add the number of unique labels in each axis as another row in the data selection tab, along with the total product of all axes, rendered in red if exceeding the 64 bit address boundary.
    Should be done as part of the import form refactor started by Niels.

     
  • High Performance Coder

    • summary: Analyse and Load Button --> Display axis size and total hypercube size in data selection tab
     

Log in to post a comment.