Menu

#266 Parse drops first column data when input is linux eol terminated

v1.0 (example)
closed-invalid
None
5
2026-01-12
2026-01-11
Dennis Cook
No

I have a problem with reading CSV to POJO when mapping with CsvBindByName. In the project I have two input files that have identical data but the EOL character is different. One file uses DOS EOL crlf. A second file use Linux EOL lf only.

The first column of each file containing a string that reparents the item number. When the parse step completes the data for that column is not included in the resulting POJO when the input file lines are Linux EOL terminated.

The attached maven project was created to use Java version 25 and the opencsv 5.12.0. Run the main function, the good and bad resulting POJOs are shown in the console.

1 Attachments

Related

Bugs: #266

Discussion

  • Scott Conway

    Scott Conway - 2026-01-12
    • status: open --> closed-invalid
    • assigned_to: Scott Conway
     
    • Dennis Cook

      Dennis Cook - 2026-01-12

      Hello Scott,Many thanks for the insight to this problem.  I have been working as a developer since the 70s and this is the first time I have come across this issue.  Even an old fart can learn something new.Dennis Cook
      On Sunday, January 11, 2026 at 09:11:36 PM CST, Scott Conway sconway@users.sourceforge.net wrote:

      • status: open --> closed-invalid
      • assigned_to: Scott Conway
      • Comment:

      Hello Dennis. I am closing this issue as invalid because it is not an opencsv issue. The issue is not because of the eol character at the end. The issue is that the listings2.csv file is an utf-8 file with an BOM character as the first character. The BOM character, if not removed, becomes part of the name of the first column causing a mismatch between the name of the column in the csv file and the name of the column configured in the java code.

      The first clue is when I ran the files command and saw that the file character sets were different.
      file -I listing*.csv
      listings.csv: text/plain; charset=us-ascii
      listings2.csv: text/csv; charset=utf-8

      The second clue was using the ls -l command and saw a two character size difference - not a one character difference if the only difference really was the eol character at the end.
      ls -l listing*.csv
      -rw-r--r--@ 1 sconway staff 382 Jan 10 18:08 listings.csv
      -rw-r--r--@ 1 sconway staff 384 Jan 10 18:08 listings2.csv

      at that point I did a hexdump command of the two and saw that there was a difference starting at the first character:
      hexdump -n10 -C listings2.csv
      00000000 ef bb bf 49 74 65 6d 20 6e 75 |...Item nu|
      0000000a
      hexdump -n10 -C listings.csv
      00000000 49 74 65 6d 20 6e 75 6d 62 65 |Item numbe|
      0000000a

      that ef bb bf you see at the start of listing2.csv is the BOM character.

      When dealing with UTF files you need to use the apache-commons commons-io BOMInputStream. I have an example of this in the unit test code in BomHandlingTest.
      import com.opencsv.bean.CsvToBeanBuilder;
      import org.apache.commons.io.ByteOrderMark;
      import org.apache.commons.io.input.BOMInputStream;
      import org.junit.jupiter.api.DisplayName;
      import org.junit.jupiter.api.Test;

      import java.io.IOException;
      import java.io.InputStreamReader;
      import java.nio.file.Paths;
      import java.util.List;

      import static org.junit.jupiter.api.Assertions.assertEquals;

      public class BomHandlingTest {

      private static final String UTF_FILE_NAME = "src/test/java/integrationTest/FAQ/utfBOMhandling/job_info.csv";
      
      @Test
      @DisplayName("Show how to handle a utf file with a bom character.")
      public void testBomHandling() throws IOException {
          BOMInputStream b = BOMInputStream.builder()
                  .setPath(Paths.get(UTF_FILE_NAME))
                  .setByteOrderMarks(ByteOrderMark.UTF_8)
                  .setInclude(false)
                  .get();
      
          InputStreamReader ff = new InputStreamReader(b);
      
          List<Job> jobs = new CsvToBeanBuilder(ff)
                  .withType(Job.class).build().parse();
      
          assertEquals(40, jobs.size());
      }
      

      }

      Basically I just want opencsv to focus on csv files not handing the vulgarities of all the different character sets mankind has invented.

      [bugs:#266] Parse drops first column data when input is linux eol terminated

      Status: closed-invalid
      Group: v1.0 (example)
      Created: Sun Jan 11, 2026 01:03 AM UTC by Dennis Cook
      Last Updated: Sun Jan 11, 2026 01:03 AM UTC
      Owner: Scott Conway

      Attachments:
      - BindByNameProject.zip (7.5 kB; application/x-zip-compressed)

      I have a problem with reading CSV to POJO when mapping with CsvBindByName. In the project I have two input files that have identical data but the EOL character is different. One file uses DOS EOL crlf. A second file use Linux EOL lf only.

      The first column of each file containing a string that reparents the item number. When the parse step completes the data for that column is not included in the resulting POJO when the input file lines are Linux EOL terminated.

      The attached maven project was created to use Java version 25 and the opencsv 5.12.0. Run the main function, the good and bad resulting POJOs are shown in the console.

      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opencsv/bugs/266/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #266

      • Scott Conway

        Scott Conway - 2026-01-15

        Hello Dennis - my pleasure. And honestly I did not know either that UTF
        sometimes had a BOM character and it would have taken me a long time to
        find out if the first person with this issue opened a bug stating opencsv
        should ignore BOM characters if it is in a UTF file. Hence my response
        about opencsv handling csv not the vulgarities of different character sets.

        And same here about the new tricks - the more I learn the more I find out
        there is much more to learn.

        :)

        On Mon, Jan 12, 2026 at 1:30 PM Dennis Cook dj_cook@users.sourceforge.net
        wrote:

        Hello Scott,Many thanks for the insight to this problem. I have been
        working as a developer since the 70s and this is the first time I have come
        across this issue. Even an old fart can learn something new.Dennis Cook
        On Sunday, January 11, 2026 at 09:11:36 PM CST, Scott Conway
        sconway@users.sourceforge.net wrote:

        • status: open --> closed-invalid
        • assigned_to: Scott Conway
        • Comment:

        Hello Dennis. I am closing this issue as invalid because it is not an
        opencsv issue. The issue is not because of the eol character at the end.
        The issue is that the listings2.csv file is an utf-8 file with an BOM
        character as the first character. The BOM character, if not removed,
        becomes part of the name of the first column causing a mismatch between the
        name of the column in the csv file and the name of the column configured in
        the java code.

        The first clue is when I ran the files command and saw that the file
        character sets were different.
        file -I listing*.csv
        listings.csv: text/plain; charset=us-ascii
        listings2.csv: text/csv; charset=utf-8

        The second clue was using the ls -l command and saw a two character size
        difference - not a one character difference if the only difference really
        was the eol character at the end.
        ls -l listing*.csv
        -rw-r--r--@ 1 sconway staff 382 Jan 10 18:08 listings.csv
        -rw-r--r--@ 1 sconway staff 384 Jan 10 18:08 listings2.csv

        at that point I did a hexdump command of the two and saw that there was a
        difference starting at the first character:
        hexdump -n10 -C listings2.csv
        00000000 ef bb bf 49 74 65 6d 20 6e 75 |...Item nu|
        0000000a
        hexdump -n10 -C listings.csv
        00000000 49 74 65 6d 20 6e 75 6d 62 65 |Item numbe|
        0000000a

        that ef bb bf you see at the start of listing2.csv is the BOM character.

        When dealing with UTF files you need to use the apache-commons commons-io
        BOMInputStream. I have an example of this in the unit test code in
        BomHandlingTest.
        import com.opencsv.bean.CsvToBeanBuilder;
        import org.apache.commons.io.ByteOrderMark;
        import org.apache.commons.io.input.BOMInputStream;
        import org.junit.jupiter.api.DisplayName;
        import org.junit.jupiter.api.Test;

        import java.io.IOException;
        import java.io.InputStreamReader;
        import java.nio.file.Paths;
        import java.util.List;

        import static org.junit.jupiter.api.Assertions.assertEquals;

        public class BomHandlingTest {

        private static final String UTF_FILE_NAME = "src/test/java/integrationTest/FAQ/utfBOMhandling/job_info.csv";
        @Test@DisplayName("Show how to handle a utf file with a bom character.")public void testBomHandling() throws IOException { BOMInputStream b = BOMInputStream.builder() .setPath(Paths.get(UTF_FILE_NAME)) .setByteOrderMarks(ByteOrderMark.UTF_8) .setInclude(false) .get();
        InputStreamReader ff = new InputStreamReader(b);
        List<Job> jobs = new CsvToBeanBuilder(ff) .withType(Job.class).build().parse();
        assertEquals(40, jobs.size());}

        }

        Basically I just want opencsv to focus on csv files not handing the
        vulgarities of all the different character sets mankind has invented.

        [bugs:#266] https://sourceforge.net/p/opencsv/bugs/266/ Parse drops
        first column data when input is linux eol terminated

        Status: closed-invalid
        Group: v1.0 (example)
        Created: Sun Jan 11, 2026 01:03 AM UTC by Dennis Cook
        Last Updated: Sun Jan 11, 2026 01:03 AM UTC
        Owner: Scott Conway

        Attachments:
        - BindByNameProject.zip (7.5 kB; application/x-zip-compressed)

        I have a problem with reading CSV to POJO when mapping with CsvBindByName.
        In the project I have two input files that have identical data but the EOL
        character is different. One file uses DOS EOL crlf. A second file use Linux
        EOL lf only.

        The first column of each file containing a string that reparents the item
        number. When the parse step completes the data for that column is not
        included in the resulting POJO when the input file lines are Linux EOL
        terminated.

        The attached maven project was created to use Java version 25 and the
        opencsv 5.12.0. Run the main function, the good and bad resulting POJOs are
        shown in the console.

        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/opencsv/bugs/266/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/


        [bugs:#266] https://sourceforge.net/p/opencsv/bugs/266/ Parse drops
        first column data when input is linux eol terminated

        Status: closed-invalid
        Group: v1.0 (example)
        Created: Sun Jan 11, 2026 01:03 AM UTC by Dennis Cook
        Last Updated: Mon Jan 12, 2026 03:11 AM UTC
        Owner: Scott Conway
        Attachments:

        I have a problem with reading CSV to POJO when mapping with CsvBindByName.
        In the project I have two input files that have identical data but the EOL
        character is different. One file uses DOS EOL crlf. A second file use Linux
        EOL lf only.

        The first column of each file containing a string that reparents the item
        number. When the parse step completes the data for that column is not
        included in the resulting POJO when the input file lines are Linux EOL
        terminated.

        The attached maven project was created to use Java version 25 and the
        opencsv 5.12.0. Run the main function, the good and bad resulting POJOs are
        shown in the console.


        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/opencsv/bugs/266/

        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/

        --
        Scott Conway
        scott.conway@gmail.com
        http://www.conwayfamily.name

         

        Related

        Bugs: #266

  • Scott Conway

    Scott Conway - 2026-01-12

    Hello Dennis. I am closing this issue as invalid because it is not an opencsv issue. The issue is not because of the eol character at the end. The issue is that the listings2.csv file is an utf-8 file with an BOM character as the first character. The BOM character, if not removed, becomes part of the name of the first column causing a mismatch between the name of the column in the csv file and the name of the column configured in the java code.

    The first clue is when I ran the files command and saw that the file character sets were different.

    file -I listing*.csv
    listings.csv:  text/plain; charset=us-ascii
    listings2.csv: text/csv; charset=utf-8
    

    The second clue was using the ls -l command and saw a two character size difference - not a one character difference if the only difference really was the eol character at the end.

     ls -l listing*.csv
    -rw-r--r--@ 1 sconway  staff  382 Jan 10 18:08 listings.csv
    -rw-r--r--@ 1 sconway  staff  384 Jan 10 18:08 listings2.csv
    

    at that point I did a hexdump command of the two and saw that there was a difference starting at the first character:

    hexdump -n10 -C listings2.csv
    00000000  ef bb bf 49 74 65 6d 20  6e 75                    |...Item nu|
    0000000a
    hexdump -n10 -C listings.csv
    00000000  49 74 65 6d 20 6e 75 6d  62 65                    |Item numbe|
    0000000a
    

    that ef bb bf you see at the start of listing2.csv is the BOM character.

    When dealing with UTF files you need to use the apache-commons commons-io BOMInputStream. I have an example of this in the unit test code in BomHandlingTest.

    import com.opencsv.bean.CsvToBeanBuilder;
    import org.apache.commons.io.ByteOrderMark;
    import org.apache.commons.io.input.BOMInputStream;
    import org.junit.jupiter.api.DisplayName;
    import org.junit.jupiter.api.Test;
    
    import java.io.IOException;
    import java.io.InputStreamReader;
    import java.nio.file.Paths;
    import java.util.List;
    
    import static org.junit.jupiter.api.Assertions.assertEquals;
    
    public class BomHandlingTest {
    
        private static final String UTF_FILE_NAME = "src/test/java/integrationTest/FAQ/utfBOMhandling/job_info.csv";
    
        @Test
        @DisplayName("Show how to handle a utf file with a bom character.")
        public void testBomHandling() throws IOException {
            BOMInputStream b = BOMInputStream.builder()
                    .setPath(Paths.get(UTF_FILE_NAME))
                    .setByteOrderMarks(ByteOrderMark.UTF_8)
                    .setInclude(false)
                    .get();
    
            InputStreamReader ff = new InputStreamReader(b);
    
            List<Job> jobs = new CsvToBeanBuilder(ff)
                    .withType(Job.class).build().parse();
    
            assertEquals(40, jobs.size());
        }
    }
    

    Basically I just want opencsv to focus on csv files not handing the vulgarities of all the different character sets mankind has invented.

     

Log in to post a comment.

MongoDB Logo MongoDB
Gen AI apps are built with MongoDB Atlas
Atlas offers built-in vector search and global availability across 125+ regions. Start building AI apps faster, all in one place.
Try Free →