Menu

#1853 Shouldn't be neceessary to ever reload file on 3rd tab

Backlog
open
nobody
None
4normal
2025-07-17
2025-01-09
Steve Keen
No

I can't see why Ravel found this file difficult to load, but it did

1 Attachments

Discussion

  • Steve Keen

    Steve Keen - 2025-01-09

    CSV file

     
  • Steve Keen

    Steve Keen - 2025-01-09

    Really weird. It looks like an utterly straightforward file, but Ravel treats most axes as ignore, and the Name field isn't populated except for the first column.

     
  • Steve Keen

    Steve Keen - 2025-01-09
    • Priority: 3ReallyUrgent --> 1Fatal
     
  • Steve Keen

    Steve Keen - 2025-01-09

    Upping this to fatal since it's impossible to get the file into Ravel.

    A related file had the same problem: https://archive.researchdata.leeds.ac.uk/1234/#

     
  • Steve Keen

    Steve Keen - 2025-01-09
     
  • Steve Keen

    Steve Keen - 2025-01-09

    Using the context menu I was able to specify the start of the data row/column, then identify the columns. But the import failed on row 301, where from the error message the first data column is being treated as an axis.

     
  • Steve Keen

    Steve Keen - 2025-01-09

    I'm guessing that this has something to do with UTF-8 encoding. Whatever, we need to be able to handle a file that appears as straightforward as this--maybe by converting to ASCII from UTF-8 on loading, if Ravel can only handle ASCII.

     
  • Steve Keen

    Steve Keen - 2025-01-09

    Whoops! On closer inspection, it appears that rows repeat--non-unique values. I'll check with the data collator, who's at the same seminar I'm at right now.

     
  • High Performance Coder

    Ravel has no problems with this file. See attached video for the process...

     
  • High Performance Coder

    PS - the presence of N/A in the data columns is probably what caused the auto format guesser to fail. If they had been NaN, instead, it would have been fine.

     
    • Steve Keen

      Steve Keen - 2025-01-10

      Confirmed--editing NA to NaN meant no problems apart from the duplicate
      rows (about which it seems the author of the file is unaware: he's at this
      workshop I'm attending). But this indicates the need for the Missing Value
      field to be on the 3rd tab in the import process rather than the 4th.

       
      • High Performance Coder

        On Fri, Jan 10, 2025 at 08:08:08AM -0000, Steve Keen wrote:

        Confirmed--editing NA to NaN meant no problems apart from the duplicate
        rows (about which it seems the author of the file is unaware: he's at this
        workshop I'm attending). But this indicates the need for the Missing Value
        field to be on the 3rd tab in the import process rather than the 4th.

        That is not what that field is for. Its for substituting another value
        for missing values (default NaN) - lets say to 0. I'm not sure anyone has
        actually found a use for it.

        Being able to specify what pattern corresponds to not a value in the
        dataset is not particularly helpful. It is only relevant for
        autoguessing the format, so by the time you get a dialog box, your
        past that stage. Once the format is defined, Ravel treats all
        non-numerical strings as a missing value.

        --


        Dr Russell Standish Phone 0425 253119 (mobile)
        Principal, High Performance Coders hpcoder@hpcoders.com.au
        http://www.hpcoders.com.au


         
        • Steve Keen

          Steve Keen - 2025-01-11

          Ok. But is it possible to add a multi valued field to the import routine’s
          guess for what represents a NaN? The import routine should be as wrinkle
          free as possible, and the fact that its guess here resulted in no fields
          being recognised initially was not good.

           
  • High Performance Coder

    No - because it uses the presence of numerical data to infer the the distinction between data and metadata. NaN and Inf are numerical data, as well the normal scientific notation +/-X.XXXXE+/-XXX where X are digits (0-9). If the data is all numerical, or contains non-standard strings to represent non-a-number, there's not much we can do.

    I do agree it is annoying that it doesn't load all columns with selector boxes, ie it shouldn't be necessary to 'reload', unless it fails to recognise the separator or quote characters. Ie the reload button shouldn't be necessary on the 3rd tab, but even though I've looked at Javascript code, I don't really understand why it does that.

     
  • High Performance Coder

    • summary: Problem importing apparently straightforward file --> Shouldn't be neceessary to ever reload file on 3rd tab
    • Priority: 1Fatal --> 4Urgent
     
  • High Performance Coder

    • Milestone: Babbage --> Backlog
     
  • High Performance Coder

    Ticket moved from /p/minsky/ravel/665/

    Can't be converted:

    • _priority: 4Urgent
     
  • High Performance Coder

    • priority: 4Urgent --> 4normal
     

Anonymous
Anonymous

Add attachments
Cancel





MongoDB Logo MongoDB