Download Latest Version CSV Parser 2.3.0_ Race Condition Fix source code.zip (276.3 kB)
Email in envelope

Get an email when there's a new version of Vince's CSV Parser

Home / 2.3.0
Name Modified Size InfoDownloads / Week
Parent folder
CSV Parser 2.3.0_ Race Condition Fix source code.tar.gz 2024-06-15 238.8 kB
CSV Parser 2.3.0_ Race Condition Fix source code.zip 2024-06-15 276.3 kB
README.md 2024-06-15 2.9 kB
Totals: 3 Items   518.0 kB 0

What's Changed

Race Condition Notes

Background

The CSV Parser tries to perform as few allocations as possible. Instead of naively storing individual CSV fields as singular std::strings in a std::vector, the parser keeps references to the raw input and uses lightweight RawCSVField objects to mark where a specific field starts and ends in that field (as well as flag indicating if an escaped quote is present). This has the benefits of:

  1. Avoiding the cost of constructing many std::string instances
  2. Avoiding the cost of constant std::vector reallocations
  3. Preserving locality of reference

Furthermore, the CSV Parser also uses separate threads for parsing CSV and for iterating over the data. As CSV rows are parsed, they are made available to the user who may utilize them without interrupting the parsing of new rows.

The Race Condition

The RawCSVField objects mentioned previously were stored as contiguous blocks, and an std::vector of pointers to these blocks were used to keep track of them.

However, as @ludovicdelfau accurately diagnosed, if the reading thread attempted to access a RawCSVField (e.g. through reading a CSVField ) at the same time that a parsing thread was pushing a new RawCSVField to an at-capacity std::vector, the parsing thread's push would cause the contents of the std::vector to be reallocated, thus causing the reading thread to access deallocated memory.

This issue was first reported in [#217].

The Fix

The fix was simple. An std::deque was dropped in to replace std::vector to store RawCSVField pointers, as std::deque does not perform reallocations. This change appears to even improve the CSV Parser's performance as the cost of constant reallocations is avoided. The loss of memory locality typical in std::deque applications was avoided as, again, the CSV Parser is storing pointers to RawCSVField[] and not the RawCSVField objects themselves.

New Contributors

Full Changelog: https://github.com/vincentlaucsb/csv-parser/compare/2.2.3...2.3.0

Source: README.md, updated 2024-06-15