Menu

#438 Civita hypercubes have a fundamental limit

Pascal
closed
None
6normal
2024-05-24
2023-12-07
No

It used to be that Ravel hypercubes needed to fit in memory, but the sparse tensor implementation allows for some truly huge and highly dimensional datasets. Unfortunately, it too has a limit - the product of the dimension sizes (ie total number of elements in a hypercube) must be less than 2^64, around 1.8 x 10^19. this is because the sparse data format consists of a 64 bit hypercube index value, and the actual double precision value for each data element present.

I just received a 19 dimensional dataset (sample.csv) with 2.9x10^30 elements in its hypercube. Ravel happily ingests it without warning, but the resulting computations are mostly nonsense.

As an interim solution, we could post a warning when the imported hypercube exceeds this limit, and request the user to ignore some data columns. But ideally, we might need to refactor the civita library to support bigger indices, perhaps using BigInts. The question is what is the performance impost of doing that.

Discussion

  • High Performance Coder

    Actually - I think the answer to this is to use the much awaited Librarian feature. There is no such restriction on the data sitting on disk, only on the internal representations in Minsky. So that would mean that we would need to throw a warning message whenever the Ravel state generates too large a hypercube, which can be cleared by appropriate rollups or slicing.

    It also means that Ravels will need to be able to export data to a file directly, rather than go through the variable export functionality, in order to be able to export files with huge hypercubes.

     
  • High Performance Coder

    Ticket moved from /p/minsky/tickets/1679/

    Can't be converted:

    • _priority: 2critical
     
  • High Performance Coder

    • Priority: 2critical --> 1Fatal
    • Milestone: Marx --> Pascal
     
  • High Performance Coder

    • Priority: 1Fatal --> 6normal
     
  • High Performance Coder

    This is done, and we now have additional warnings occur when things are about to run out of memory, and as per https://sourceforge.net/p/minsky/ravel/564/, when shit really hits the fan, a nice user message is posted and the work saved.

     
  • High Performance Coder

    • status: open --> closed
    • assigned_to: High Performance Coder
     

Log in to post a comment.