opencsv / Bugs / #266 Parse drops first column data when input is linux eol terminated

Scott Conway - 2026-01-12

status: open --> closed-invalid

assigned_to: Scott Conway
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Dennis Cook - 2026-01-12
  
  Hello Scott,Many thanks for the insight to this problem. I have been working as a developer since the 70s and this is the first time I have come across this issue. Even an old fart can learn something new.Dennis Cook
  On Sunday, January 11, 2026 at 09:11:36 PM CST, Scott Conway sconway@users.sourceforge.net wrote:
  
  status: open --> closed-invalid
  
  assigned_to: Scott Conway
  
  Comment:
  
  Hello Dennis. I am closing this issue as invalid because it is not an opencsv issue. The issue is not because of the eol character at the end. The issue is that the listings2.csv file is an utf-8 file with an BOM character as the first character. The BOM character, if not removed, becomes part of the name of the first column causing a mismatch between the name of the column in the csv file and the name of the column configured in the java code.
  
  The first clue is when I ran the files command and saw that the file character sets were different.
  file -I listing*.csv
  listings.csv: text/plain; charset=us-ascii
  listings2.csv: text/csv; charset=utf-8
  
  The second clue was using the ls -l command and saw a two character size difference - not a one character difference if the only difference really was the eol character at the end.
  ls -l listing*.csv
  -rw-r--r--@ 1 sconway staff 382 Jan 10 18:08 listings.csv
  -rw-r--r--@ 1 sconway staff 384 Jan 10 18:08 listings2.csv
  
  at that point I did a hexdump command of the two and saw that there was a difference starting at the first character:
  hexdump -n10 -C listings2.csv
  00000000 ef bb bf 49 74 65 6d 20 6e 75 |...Item nu|
  0000000a
  hexdump -n10 -C listings.csv
  00000000 49 74 65 6d 20 6e 75 6d 62 65 |Item numbe|
  0000000a
  
  that ef bb bf you see at the start of listing2.csv is the BOM character.
  
  When dealing with UTF files you need to use the apache-commons commons-io BOMInputStream. I have an example of this in the unit test code in BomHandlingTest.
  import com.opencsv.bean.CsvToBeanBuilder;
  import org.apache.commons.io.ByteOrderMark;
  import org.apache.commons.io.input.BOMInputStream;
  import org.junit.jupiter.api.DisplayName;
  import org.junit.jupiter.api.Test;
  
  import java.io.IOException;
  import java.io.InputStreamReader;
  import java.nio.file.Paths;
  import java.util.List;
  
  import static org.junit.jupiter.api.Assertions.assertEquals;
  
  public class BomHandlingTest {
  
  private static final String UTF_FILE_NAME = "src/test/java/integrationTest/FAQ/utfBOMhandling/job_info.csv"; @Test @DisplayName("Show how to handle a utf file with a bom character.") public void testBomHandling() throws IOException { BOMInputStream b = BOMInputStream.builder() .setPath(Paths.get(UTF_FILE_NAME)) .setByteOrderMarks(ByteOrderMark.UTF_8) .setInclude(false) .get(); InputStreamReader ff = new InputStreamReader(b); List<Job> jobs = new CsvToBeanBuilder(ff) .withType(Job.class).build().parse(); assertEquals(40, jobs.size()); }
  
  }
  
  Basically I just want opencsv to focus on csv files not handing the vulgarities of all the different character sets mankind has invented.
  
  [bugs:#266] Parse drops first column data when input is linux eol terminated
  
  Status: closed-invalid
  Group: v1.0 (example)
  Created: Sun Jan 11, 2026 01:03 AM UTC by Dennis Cook
  Last Updated: Sun Jan 11, 2026 01:03 AM UTC
  Owner: Scott Conway
  
  Attachments:
  - BindByNameProject.zip (7.5 kB; application/x-zip-compressed)
  
  I have a problem with reading CSV to POJO when mapping with CsvBindByName. In the project I have two input files that have identical data but the EOL character is different. One file uses DOS EOL crlf. A second file use Linux EOL lf only.
  
  The first column of each file containing a string that reparents the item number. When the parse step completes the data for that column is not included in the resulting POJO when the input file lines are Linux EOL terminated.
  
  The attached maven project was created to use Java version 25 and the opencsv 5.12.0. Run the main function, the good and bad resulting POJOs are shown in the console.
  
  Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opencsv/bugs/266/
  
  To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
  
  Related
  
  Bugs: ~~#266~~
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Scott Conway - 2026-01-15
    
    Hello Dennis - my pleasure. And honestly I did not know either that UTF
    sometimes had a BOM character and it would have taken me a long time to
    find out if the first person with this issue opened a bug stating opencsv
    should ignore BOM characters if it is in a UTF file. Hence my response
    about opencsv handling csv not the vulgarities of different character sets.
    
    And same here about the new tricks - the more I learn the more I find out
    there is much more to learn.
    
    :)
    
    On Mon, Jan 12, 2026 at 1:30 PM Dennis Cook dj_cook@users.sourceforge.net
    wrote:
    
    Hello Scott,Many thanks for the insight to this problem. I have been
    working as a developer since the 70s and this is the first time I have come
    across this issue. Even an old fart can learn something new.Dennis Cook
    On Sunday, January 11, 2026 at 09:11:36 PM CST, Scott Conway
    sconway@users.sourceforge.net wrote:
    
    status: open --> closed-invalid
    
    assigned_to: Scott Conway
    
    Comment:
    
    Hello Dennis. I am closing this issue as invalid because it is not an
    opencsv issue. The issue is not because of the eol character at the end.
    The issue is that the listings2.csv file is an utf-8 file with an BOM
    character as the first character. The BOM character, if not removed,
    becomes part of the name of the first column causing a mismatch between the
    name of the column in the csv file and the name of the column configured in
    the java code.
    
    The first clue is when I ran the files command and saw that the file
    character sets were different.
    file -I listing*.csv
    listings.csv: text/plain; charset=us-ascii
    listings2.csv: text/csv; charset=utf-8
    
    The second clue was using the ls -l command and saw a two character size
    difference - not a one character difference if the only difference really
    was the eol character at the end.
    ls -l listing*.csv
    -rw-r--r--@ 1 sconway staff 382 Jan 10 18:08 listings.csv
    -rw-r--r--@ 1 sconway staff 384 Jan 10 18:08 listings2.csv
    
    at that point I did a hexdump command of the two and saw that there was a
    difference starting at the first character:
    hexdump -n10 -C listings2.csv
    00000000 ef bb bf 49 74 65 6d 20 6e 75 |...Item nu|
    0000000a
    hexdump -n10 -C listings.csv
    00000000 49 74 65 6d 20 6e 75 6d 62 65 |Item numbe|
    0000000a
    
    that ef bb bf you see at the start of listing2.csv is the BOM character.
    
    When dealing with UTF files you need to use the apache-commons commons-io
    BOMInputStream. I have an example of this in the unit test code in
    BomHandlingTest.
    import com.opencsv.bean.CsvToBeanBuilder;
    import org.apache.commons.io.ByteOrderMark;
    import org.apache.commons.io.input.BOMInputStream;
    import org.junit.jupiter.api.DisplayName;
    import org.junit.jupiter.api.Test;
    
    import java.io.IOException;
    import java.io.InputStreamReader;
    import java.nio.file.Paths;
    import java.util.List;
    
    import static org.junit.jupiter.api.Assertions.assertEquals;
    
    public class BomHandlingTest {
    
    private static final String UTF_FILE_NAME = "src/test/java/integrationTest/FAQ/utfBOMhandling/job_info.csv";
    @Test@DisplayName("Show how to handle a utf file with a bom character.")public void testBomHandling() throws IOException { BOMInputStream b = BOMInputStream.builder() .setPath(Paths.get(UTF_FILE_NAME)) .setByteOrderMarks(ByteOrderMark.UTF_8) .setInclude(false) .get();
    InputStreamReader ff = new InputStreamReader(b);
    List<Job> jobs = new CsvToBeanBuilder(ff) .withType(Job.class).build().parse();
    assertEquals(40, jobs.size());}
    
    }
    
    Basically I just want opencsv to focus on csv files not handing the
    vulgarities of all the different character sets mankind has invented.
    
    [bugs:#266] https://sourceforge.net/p/opencsv/bugs/266/ Parse drops
    first column data when input is linux eol terminated
    
    Status: closed-invalid
    Group: v1.0 (example)
    Created: Sun Jan 11, 2026 01:03 AM UTC by Dennis Cook
    Last Updated: Sun Jan 11, 2026 01:03 AM UTC
    Owner: Scott Conway
    
    Attachments:
    - BindByNameProject.zip (7.5 kB; application/x-zip-compressed)
    
    I have a problem with reading CSV to POJO when mapping with CsvBindByName.
    In the project I have two input files that have identical data but the EOL
    character is different. One file uses DOS EOL crlf. A second file use Linux
    EOL lf only.
    
    The first column of each file containing a string that reparents the item
    number. When the parse step completes the data for that column is not
    included in the resulting POJO when the input file lines are Linux EOL
    terminated.
    
    The attached maven project was created to use Java version 25 and the
    opencsv 5.12.0. Run the main function, the good and bad resulting POJOs are
    shown in the console.
    
    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/opencsv/bugs/266/
    
    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/
    
    [bugs:#266] https://sourceforge.net/p/opencsv/bugs/266/ Parse drops
    first column data when input is linux eol terminated
    
    Status: closed-invalid
    Group: v1.0 (example)
    Created: Sun Jan 11, 2026 01:03 AM UTC by Dennis Cook
    Last Updated: Mon Jan 12, 2026 03:11 AM UTC
    Owner: Scott Conway
    Attachments:
    
    BindByNameProject.zip
    https://sourceforge.net/p/opencsv/bugs/266/attachment/BindByNameProject.zip
    (7.5 kB; application/x-zip-compressed)
    
    I have a problem with reading CSV to POJO when mapping with CsvBindByName.
    In the project I have two input files that have identical data but the EOL
    character is different. One file uses DOS EOL crlf. A second file use Linux
    EOL lf only.
    
    The first column of each file containing a string that reparents the item
    number. When the parse step completes the data for that column is not
    included in the resulting POJO when the input file lines are Linux EOL
    terminated.
    
    The attached maven project was created to use Java version 25 and the
    opencsv 5.12.0. Run the main function, the good and bad resulting POJOs are
    shown in the console.
    
    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/opencsv/bugs/266/
    
    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/
    
    --
    Scott Conway
    scott.conway@gmail.com
    http://www.conwayfamily.name
    
    Related
    
    Bugs: ~~#266~~
    
    alternate
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hello Dennis. I am closing this issue as invalid because it is not an opencsv issue. The issue is not because of the eol character at the end. The issue is that the listings2.csv file is an utf-8 file with an BOM character as the first character. The BOM character, if not removed, becomes part of the name of the first column causing a mismatch between the name of the column in the csv file and the name of the column configured in the java code.

The first clue is when I ran the files command and saw that the file character sets were different.

file -I listing*.csv
listings.csv:  text/plain; charset=us-ascii
listings2.csv: text/csv; charset=utf-8

The second clue was using the ls -l command and saw a two character size difference - not a one character difference if the only difference really was the eol character at the end.

 ls -l listing*.csv
-rw-r--r--@ 1 sconway  staff  382 Jan 10 18:08 listings.csv
-rw-r--r--@ 1 sconway  staff  384 Jan 10 18:08 listings2.csv

at that point I did a hexdump command of the two and saw that there was a difference starting at the first character:

hexdump -n10 -C listings2.csv
00000000  ef bb bf 49 74 65 6d 20  6e 75                    |...Item nu|
0000000a
hexdump -n10 -C listings.csv
00000000  49 74 65 6d 20 6e 75 6d  62 65                    |Item numbe|
0000000a

that ef bb bf you see at the start of listing2.csv is the BOM character.

When dealing with UTF files you need to use the apache-commons commons-io BOMInputStream. I have an example of this in the unit test code in BomHandlingTest.

import com.opencsv.bean.CsvToBeanBuilder;
import org.apache.commons.io.ByteOrderMark;
import org.apache.commons.io.input.BOMInputStream;
import org.junit.jupiter.api.DisplayName;
import org.junit.jupiter.api.Test;

import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.file.Paths;
import java.util.List;

import static org.junit.jupiter.api.Assertions.assertEquals;

public class BomHandlingTest {

    private static final String UTF_FILE_NAME = "src/test/java/integrationTest/FAQ/utfBOMhandling/job_info.csv";

    @Test
    @DisplayName("Show how to handle a utf file with a bom character.")
    public void testBomHandling() throws IOException {
        BOMInputStream b = BOMInputStream.builder()
                .setPath(Paths.get(UTF_FILE_NAME))
                .setByteOrderMarks(ByteOrderMark.UTF_8)
                .setInclude(false)
                .get();

        InputStreamReader ff = new InputStreamReader(b);

        List<Job> jobs = new CsvToBeanBuilder(ff)
                .withType(Job.class).build().parse();

        assertEquals(40, jobs.size());
    }
}

Basically I just want opencsv to focus on csv files not handing the vulgarities of all the different character sets mankind has invented.

Parse drops first column data when input is linux eol terminated

Group

Searches

Help

#266 Parse drops first column data when input is linux eol terminated

Related

Discussion

Related

Related