Minsky / Ravel / #317 Support embedded linefeeds

TomScat - 2023-01-31

Trying to attach the database (might be too big).

It's this one:
https://finances.worldbank.org/Loans-and-Credits/IBRD-Statement-Of-Loans-Historical-Data/zucq-nrc3

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

High Performance Coder - 2023-01-31

Line 8786: does this include the header lines?
Yes.

"dimensioned column: 4.25" I'm not sure where to look.
You have labelled the column 4.25, rather than "Interest Rate", which might be more helpful. There should be some sort of issue on line 8786. I'm exporting that dataset now to see if it is the same for me.

👍
1
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Steve Keen - 2023-01-31
  
  This is a very useful database for testing Ravel. It emphasises the need
  for an in situ tool to see and edit data, since this file is too big to
  load into Excel (it truncates the file at its record limit).
  
  Maybe adding a viewing window to the import routine that shows the
  offending row and its neighbours and allows the user to edit the
  highlighted error?
  
  Automated cleaning will also be necessary. Yesterday I located a database
  that used a dash "-" for no data. A tool to convert such things (including
  Excel's N/A) into NaNs (or just empty cells) would be great.
  Best, Steve
  Professor Steve Keen
  Want to rebuild economics? Support me on
  Patreon: https://www.patreon.com/ProfSteveKeen
  https://www.patreon.com/ProfSteveKeen
  My latest book, The New Economics, is now available from Polity:
  http://politybooks.com/bookdetail/?isbn=9781509545285
  @ProfSteveKeen
  Mobile +66 (0) 99-257-2692
  
  Honorary Professor, UCL &ISRS Distinguished Research Fellow
  www.profstevekeen.com
  
  On Tue, Jan 31, 2023 at 7:54 AM High Performance Coder hpcoder@users.sourceforge.net wrote:
  
  Line 8786: does this include the header lines?
  Yes.
  
  "dimensioned column: 4.25" I'm not sure where to look.
  You have labelled the column 4.25, rather than "Interest Rate", which
  might be more helpful. There should be some sort of issue on line 8786. I'm
  exporting that dataset now to see if it is the same for me.
  
  [ravel:#317] https://sourceforge.net/p/minsky/ravel/317/ Ravel v19 -
  error message - invalid data*
  
  Status: open
  Milestone: Pascal
  Created: Tue Jan 31, 2023 04:48 AM UTC by TomScat
  Last Updated: Tue Jan 31, 2023 05:43 AM UTC
  Owner: nobody
  Attachments:
  
  2023-01-31 error message.png
  https://sourceforge.net/p/minsky/ravel/317/attachment/2023-01-31%20error%20message.png
  (82.0 kB; image/png)
  
  More a question that a bug: I'm trying to open a db and get the error
  message:
  Something went wrong: invalid data: for value dimensioned column: 4.25 at
  line 8786, col 10
  
  How do I have to understand this?
  - Line 8786: does this include the header lines?
  - "dimensioned column: 4.25" I'm not sure where to look.
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/minsky/ravel/317/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  👍
  1
  
  Related
  
  Ravel: ~~#317~~
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - High Performance Coder - 2023-01-31
    
    The error report option floats the errors to the top of the dataset, so you can edit (ie fix) the data. Unfortunately, not so useful when the dataset is too large for a spreadsheet to import it, or even a text editor (emacs struggled on this dataset).
    
    I'm not sure that we could successfully add an editing tool that will handle these large dataset cases either, BTW. Usually at this stage, its using unix CLI tools, or python scripts to get it done.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

TomScat - 2023-01-31

I have been looking at another database which is more limited, both in number of lines and columns. But it has its data arranged in a consecutive way, not in separate columns. So the date '6/30/2018' for example can be found +/- 20 times.

https://finances.worldbank.org/Financial-Reporting/Historical-IDA-Income-Statements-Data/5fcd-tqcy

Historical_IDA_Income_Statements_Data(1).csv

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Interesting "bad boy" example. There are a mutliple duplicate records in this dataset, nearly 100 of which are exact duplicates.

zen>sort IBRD_Statement_Of_Loans_-_Historical_Data.csv|uniq|wc -l
1169592
zen>wc -l IBRD_Statement_Of_Loans_-_Historical_Data.csv 
1169685 IBRD_Statement_Of_Loans_-_Historical_Data.csv

I chose to average the values of the duplicate records (choices are typically average, max or min, and if they're exactly duplicate, it doesn't matter).

I selected ignore for the trailing date columns, and data for the interest rate. and the numerical columns from original principal to loans held.

I got a "missing data" error on line 184888. this file is rather too large for spreadsheets, and even my text editor was struggling with it, so I examined the 10 lines following from 184888:

zen>tail -n +184888 IBRD_Statement_Of_Loans_-_Historical_Data.csv|head -10
01/31/2012 12:00:00 AM,IBRD42460,EUROPE AND CENTRAL ASIA,HR,Croatia,Zagrebacka banka d.d.,HR,Croatia,SNGL CRNCY,Disbursed,1.97,,P040139,INVESTMENT RECOVERY,8000000,3734380.61,0,3636810.02,4451700.03,500253.29,0,500253.29,0,0,0,500253.29,03/15/2003 12:00:00 AM,09/15/2012 12:00:00 AM,12/04/1997 12:00:00 AM,11/18/1997 12:00:00 AM,03/17/1998 12:00:00 AM,12/31/2001 12:00:00 AM,09/27/2001 12:00:00 AM
02/29/2012 12:00:00 AM,IBRD42460,EUROPE AND CENTRAL ASIA,HR,Croatia,"Zagreba
ka banka d.d.",HR,Croatia,SNGL CRNCY,Disbursed,1.97,,P040139,INVESTMENT RECOVERY,8000000,3734380.61,0,3636810.02,4451700.03,510294.8,0,510294.8,0,0,0,510294.8,03/15/2003 12:00:00 AM,09/15/2012 12:00:00 AM,12/04/1997 12:00:00 AM,11/18/1997 12:00:00 AM,03/17/1998 12:00:00 AM,12/31/2001 12:00:00 AM,09/27/2001 12:00:00 AM
03/31/2012 12:00:00 AM,IBRD42460,EUROPE AND CENTRAL ASIA,HR,Croatia,"Zagreba
ka banka d.d.",HR,Croatia,SNGL CRNCY,Disbursed,1.41,,P040139,INVESTMENT RECOVERY,8000000,3734380.61,0,3636810.02,4698768.85,253603.82,0,253603.82,0,0,0,253603.82,03/15/2003 12:00:00 AM,09/15/2012 12:00:00 AM,12/04/1997 12:00:00 AM,11/18/1997 12:00:00 AM,03/17/1998 12:00:00 AM,12/31/2001 12:00:00 AM,09/27/2001 12:00:00 AM
04/30/2012 12:00:00 AM,IBRD42460,EUROPE AND CENTRAL ASIA,HR,Croatia,"Zagreba
ka banka d.d.",HR,Croatia,SNGL CRNCY,Disbursed,1.41,,P040139,INVESTMENT RECOVERY,8000000,3734380.61,0,3636810.02,4698768.85,251425.95,0,251425.95,0,0,0,251425.95,03/15/2003 12:00:00 AM,09/15/2012 12:00:00 AM,12/04/1997 12:00:00 AM,11/18/1997 12:00:00 AM,03/17/1998 12:00:00 AM,12/31/2001 12:00:00 AM,09/27/2001 12:00:00 AM
05/31/2012 12:00:00 AM,IBRD42460,EUROPE AND CENTRAL ASIA,HR,Croatia,"Zagreba
ka banka d.d.",HR,Croatia,SNGL CRNCY,Disbursed,1.41,,P040139,INVESTMENT RECOVERY,8000000,3734380.61,0,3636810.02,4698768.85,235924.02,0,235924.02,0,0,0,235924.02,03/15/2003 12:00:00 AM,09/15/2012 12:00:00 AM,12/04/1997 12:00:00 AM,11/18/1997 12:00:00 AM,03/17/1998 12:00:00 AM,12/31/2001 12:00:00 AM,09/27/2001 12:00:00 AM
06/30/2012 12:00:00 AM,IBRD42460,EUROPE AND CENTRAL ASIA,HR,Croatia,"Zagreba
zen>tail -n +184888 IBRD_Statement_Of_Loans_-_Historical_Data.csv|head -10
01/31/2012 12:00:00 AM,IBRD42460,EUROPE AND CENTRAL ASIA,HR,Croatia,Zagrebacka banka d.d.,HR,Croatia,SNGL CRNCY,Disbursed,1.97,,P040139,INVESTMENT RECOVERY,8000000,3734380.61,0,3636810.02,4451700.03,500253.29,0,500253.29,0,0,0,500253.29,03/15/2003 12:00:00 AM,09/15/2012 12:00:00 AM,12/04/1997 12:00:00 AM,11/18/1997 12:00:00 AM,03/17/1998 12:00:00 AM,12/31/2001 12:00:00 AM,09/27/2001 12:00:00 AM
02/29/2012 12:00:00 AM,IBRD42460,EUROPE AND CENTRAL ASIA,HR,Croatia,"Zagreba
ka banka d.d.",HR,Croatia,SNGL CRNCY,Disbursed,1.97,,P040139,INVESTMENT RECOVERY,8000000,3734380.61,0,3636810.02,4451700.03,510294.8,0,510294.8,0,0,0,510294.8,03/15/2003 12:00:00 AM,09/15/2012 12:00:00 AM,12/04/1997 12:00:00 AM,11/18/1997 12:00:00 AM,03/17/1998 12:00:00 AM,12/31/2001 12:00:00 AM,09/27/2001 12:00:00 AM
03/31/2012 12:00:00 AM,IBRD42460,EUROPE AND CENTRAL ASIA,HR,Croatia,"Zagreba
ka banka d.d.",HR,Croatia,SNGL CRNCY,Disbursed,1.41,,P040139,INVESTMENT RECOVERY,8000000,3734380.61,0,3636810.02,4698768.85,253603.82,0,253603.82,0,0,0,253603.82,03/15/2003 12:00:00 AM,09/15/2012 12:00:00 AM,12/04/1997 12:00:00 AM,11/18/1997 12:00:00 AM,03/17/1998 12:00:00 AM,12/31/2001 12:00:00 AM,09/27/2001 12:00:00 AM
04/30/2012 12:00:00 AM,IBRD42460,EUROPE AND CENTRAL ASIA,HR,Croatia,"Zagreba
ka banka d.d.",HR,Croatia,SNGL CRNCY,Disbursed,1.41,,P040139,INVESTMENT RECOVERY,8000000,3734380.61,0,3636810.02,4698768.85,251425.95,0,251425.95,0,0,0,251425.95,03/15/2003 12:00:00 AM,09/15/2012 12:00:00 AM,12/04/1997 12:00:00 AM,11/18/1997 12:00:00 AM,03/17/1998 12:00:00 AM,12/31/2001 12:00:00 AM,09/27/2001 12:00:00 AM
05/31/2012 12:00:00 AM,IBRD42460,EUROPE AND CENTRAL ASIA,HR,Croatia,"Zagreba
ka banka d.d.",HR,Croatia,SNGL CRNCY,Disbursed,1.41,,P040139,INVESTMENT RECOVERY,8000000,3734380.61,0,3636810.02,4698768.85,235924.02,0,235924.02,0,0,0,235924.02,03/15/2003 12:00:00 AM,09/15/2012 12:00:00 AM,12/04/1997 12:00:00 AM,11/18/1997 12:00:00 AM,03/17/1998 12:00:00 AM,12/31/2001 12:00:00 AM,09/27/2001 12:00:00 AM
06/30/2012 12:00:00 AM,IBRD42460,EUROPE AND CENTRAL ASIA,HR,Croatia,"Zagreba

It looks to me like the export program has started inserting spurious line feeds into the data.

Not sure what to do to correct that. Maybe just process the first 184887 lines?

High Performance Coder - 2023-01-31

Got it imported (first 184887 lines).

IBRD_loans.rvl

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

High Performance Coder - 2023-02-02

According to RFC1480, linefeeds are acceptable within a quoted field. Dang - this complicates our CSV parsing dramatically...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

High Performance Coder - 2023-02-02

summary: Ravel v19 - error message - invalid data --> Support embedded linefeeds
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

High Performance Coder - 2023-08-18

labels: --> CSV import
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

High Performance Coder - 2023-08-21

status: open --> closed

assigned_to: High Performance Coder
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

High Performance Coder - 2023-08-21

Done

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Support embedded linefeeds

System dynamics program with additional features for economics

Milestone

Searches

Help

#317 Support embedded linefeeds

Related

Discussion

Related