Menu

#259 Reader with RFC-4180 Parser returns null for blank row

v1.0 (example)
closed-fixed
None
5
2 days ago
2025-05-14
No

There appears to be a change in behaviour in OpenCSV v5.11 that has created an issue when reading a file that contains a blank line in the middle of the file.

Given the code

        CSVReader reader = new CSVReaderBuilder(new StringReader("Hello\n\nWorld"))
                .withCSVParser(new RFC4180Parser())
                .build();

        String[] line = null;
        while ((line = reader.readNext()) != null) {
            System.out.println(Arrays.stream(line).collect(Collectors.toList()));
        }

on version v5.10, this prints

[Hello]
[]
[World]

On version v5.11, this prints

[Hello]

This is because previously, when the reader encountered a blank line, it would return an array containing one element, which would be an empty string.

It seems now this returns null instead.

This is problematic, as the javadoc on com.opencsv.CSVReader#readNext states:

@return A string array with each comma-separated element as a separate entry, or null if there is no more input.

If we return null for a blank line, we assume that no more content exists in the file, and we stop reading in our application.

The API gives no other way to determine if we have read the rest of the contents.

It's worth noting that this issue only affects the RFC-4180 Parser. If we use the default CSV Parser, this issue does not occur.

It seems like this is due to a change in the implementation of com.opencsv.RFC4180Parser#tokenizeStringIntoArray in this commit: https://sourceforge.net/p/opencsv/source/ci/5efc0d401137fb12f0530126e5616bf88c3d3992/

Discussion

  • James Cooper

    James Cooper - 2025-07-22

    Finding this bug just cost me half a day :/

    An empty string in the input causes the reader to premature stop parsing.

    If you want to avoid the overhead of using a regex based split in 5.10, then rather than use commons-lang's StringUtils.splitPreserveAllTokens, what about a simple character based split method inside RFC4180Parser, e.g.

    private String[] tokenizeStringIntoArray(String nextLine) {
        List<String> tokens = new ArrayList<>();
        int nextIndex = 0;
        while (true) {
            int lastIndex = nextIndex;
            nextIndex = nextLine.indexOf(separator, nextIndex);
            if (nextIndex < 0) {
                tokens.add(nextLine.substring(lastIndex));
                return tokens.toArray(new String[0]);
            } else {
                tokens.add(nextLine.substring(lastIndex, nextIndex++));
            }
        }
    }
    
     
  • Filipe

    Filipe - 2025-07-22

    This broke liquibase for some users: https://github.com/liquibase/liquibase/issues/7020

     

    Last edit: Filipe 2025-07-22
  • Scott Conway

    Scott Conway - 6 days ago

    Will try and create a unit test to duplicate this issue this weekend.

     
  • Scott Conway

    Scott Conway - 6 days ago
    • assigned_to: Scott Conway
     
  • James Cooper

    James Cooper - 6 days ago

    That's great, thanks. Here's a simple test:

    public class Bug259Test {
        @Test
        public void parseUsingRFC4180Parser() throws IOException {
            ICSVParser parser = new RFC4180Parser();
            assertEquals(Collections.singletonList(""), Arrays.asList(parser.parseLine("")));
        }
    }
    

    You can't split something and get nothing, unless you're commons-lang I suppose.

     
  • Scott Conway

    Scott Conway - 4 days ago
    • status: open --> closed-fixed
     
  • Scott Conway

    Scott Conway - 4 days ago

    fixed in 5.12.0 release.

     
  • Scott Conway

    Scott Conway - 4 days ago
     
  • Scott Conway

    Scott Conway - 4 days ago

    The fix has been merged with version 5.12.0

     
    🎉
    1
  • Filipe

    Filipe - 2 days ago

    Thanks for the quick turnaround @sconway!

     
    ❤️
    1

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.