There appears to be a change in behaviour in OpenCSV v5.11 that has created an issue when reading a file that contains a blank line in the middle of the file.
Given the code
CSVReader reader = new CSVReaderBuilder(new StringReader("Hello\n\nWorld"))
.withCSVParser(new RFC4180Parser())
.build();
String[] line = null;
while ((line = reader.readNext()) != null) {
System.out.println(Arrays.stream(line).collect(Collectors.toList()));
}
on version v5.10, this prints
[Hello]
[]
[World]
On version v5.11, this prints
[Hello]
This is because previously, when the reader encountered a blank line, it would return an array containing one element, which would be an empty string.
It seems now this returns null instead.
This is problematic, as the javadoc on com.opencsv.CSVReader#readNext states:
@return A string array with each comma-separated element as a separate entry, or null if there is no more input.
If we return null for a blank line, we assume that no more content exists in the file, and we stop reading in our application.
The API gives no other way to determine if we have read the rest of the contents.
It's worth noting that this issue only affects the RFC-4180 Parser. If we use the default CSV Parser, this issue does not occur.
It seems like this is due to a change in the implementation of com.opencsv.RFC4180Parser#tokenizeStringIntoArray in this commit: https://sourceforge.net/p/opencsv/source/ci/5efc0d401137fb12f0530126e5616bf88c3d3992/
Finding this bug just cost me half a day :/
An empty string in the input causes the reader to premature stop parsing.
If you want to avoid the overhead of using a regex based split in 5.10, then rather than use commons-lang's
StringUtils.splitPreserveAllTokens
, what about a simple character based split method insideRFC4180Parser
, e.g.This broke liquibase for some users: https://github.com/liquibase/liquibase/issues/7020
Last edit: Filipe 2025-07-22
Will try and create a unit test to duplicate this issue this weekend.
That's great, thanks. Here's a simple test:
You can't split something and get nothing, unless you're commons-lang I suppose.
fixed in 5.12.0 release.
The fix has been merged with version 5.12.0
Thanks for the quick turnaround @sconway!