Autoplot / Bugs / #2703 Slow parsing of csv when there are many empty fields, nominal data

Autoplot is an interactive browser for data on the web

#2703 Slow parsing of csv when there are many empty fields, nominal data

Milestone: nextrelease

Status: open-fixed

Owner: nobody

Labels: None

Priority: 5

Updated: 2025-05-10

Created: 2025-05-10

Creator: Jeremy Faden

Private: No

Ivar commented that he had a csv file which parsed quite slowly. A week or two ago I noticed that when parsing csv files with nominal fields with many values, the read would be slow, because the time was O(M*N) where M is the number of lines and N is the number of nominal values. This was the ships at sea file https://autoplot.org/jnlp/20250507a/ and I fixed this. However, Ivar's case is still about half of the speed of loffice reading the CSV into a spreadsheet.

Discussion

Jeremy Faden - 2025-05-10

I think the problem is for every record, Autoplot attempts to read each field and catches the parse error in a try/catch when there is a parse error. One should avoid this coding because catching an exception is computationally expensive. I now check the length of each field, and when they are 0 length, I will automatically insert a fill value.

Performance with this change is now better than Libre Office. I'm committing the change and it will be available in the next development release. If tests are unaffected the change will be in the next production version.

Last edit: Jeremy Faden 2025-05-10

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeremy Faden - 2025-05-10

status: open --> open-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Slow parsing of csv when there are many empty fields, nominal data

Autoplot is an interactive browser for data on the web

Group

Searches

Help

#2703 Slow parsing of csv when there are many empty fields, nominal data

Discussion