Support embedded linefeeds
System dynamics program with additional features for economics
Brought to you by:
hpcoder,
profstevekeen
More a question that a bug: I'm trying to open a db and get the error message:
Something went wrong: invalid data: for value dimensioned column: 4.25 at line 8786, col 10
How do I have to understand this?
- Line 8786: does this include the header lines?
- "dimensioned column: 4.25" I'm not sure where to look.
Trying to attach the database (might be too big).
It's this one:
https://finances.worldbank.org/Loans-and-Credits/IBRD-Statement-Of-Loans-Historical-Data/zucq-nrc3
Yes.
You have labelled the column 4.25, rather than "Interest Rate", which might be more helpful. There should be some sort of issue on line 8786. I'm exporting that dataset now to see if it is the same for me.
This is a very useful database for testing Ravel. It emphasises the need
for an in situ tool to see and edit data, since this file is too big to
load into Excel (it truncates the file at its record limit).
Maybe adding a viewing window to the import routine that shows the
offending row and its neighbours and allows the user to edit the
highlighted error?
Automated cleaning will also be necessary. Yesterday I located a database
that used a dash "-" for no data. A tool to convert such things (including
Excel's N/A) into NaNs (or just empty cells) would be great.
Best, Steve
Professor Steve Keen
Want to rebuild economics? Support me on
Patreon: https://www.patreon.com/ProfSteveKeen
https://www.patreon.com/ProfSteveKeen
My latest book, The New Economics, is now available from Polity:
http://politybooks.com/bookdetail/?isbn=9781509545285
@ProfSteveKeen
Mobile +66 (0) 99-257-2692
Honorary Professor, UCL &ISRS Distinguished Research Fellow
www.profstevekeen.com
On Tue, Jan 31, 2023 at 7:54 AM High Performance Coder hpcoder@users.sourceforge.net wrote:
Related
Ravel:
#317The error report option floats the errors to the top of the dataset, so you can edit (ie fix) the data. Unfortunately, not so useful when the dataset is too large for a spreadsheet to import it, or even a text editor (emacs struggled on this dataset).
I'm not sure that we could successfully add an editing tool that will handle these large dataset cases either, BTW. Usually at this stage, its using unix CLI tools, or python scripts to get it done.
I have been looking at another database which is more limited, both in number of lines and columns. But it has its data arranged in a consecutive way, not in separate columns. So the date '6/30/2018' for example can be found +/- 20 times.
https://finances.worldbank.org/Financial-Reporting/Historical-IDA-Income-Statements-Data/5fcd-tqcy
Interesting "bad boy" example. There are a mutliple duplicate records in this dataset, nearly 100 of which are exact duplicates.
I chose to average the values of the duplicate records (choices are typically average, max or min, and if they're exactly duplicate, it doesn't matter).
I selected ignore for the trailing date columns, and data for the interest rate. and the numerical columns from original principal to loans held.
I got a "missing data" error on line 184888. this file is rather too large for spreadsheets, and even my text editor was struggling with it, so I examined the 10 lines following from 184888:
It looks to me like the export program has started inserting spurious line feeds into the data.
Not sure what to do to correct that. Maybe just process the first 184887 lines?
Got it imported (first 184887 lines).
According to RFC1480, linefeeds are acceptable within a quoted field. Dang - this complicates our CSV parsing dramatically...
Done