I have a problem with reading CSV to POJO when mapping with CsvBindByName. In the project I have two input files that have identical data but the EOL character is different. One file uses DOS EOL crlf. A second file use Linux EOL lf only.
The first column of each file containing a string that reparents the item number. When the parse step completes the data for that column is not included in the resulting POJO when the input file lines are Linux EOL terminated.
The attached maven project was created to use Java version 25 and the opencsv 5.12.0. Run the main function, the good and bad resulting POJOs are shown in the console.
Hello Scott,Many thanks for the insight to this problem. I have been working as a developer since the 70s and this is the first time I have come across this issue. Even an old fart can learn something new.Dennis Cook
On Sunday, January 11, 2026 at 09:11:36 PM CST, Scott Conway sconway@users.sourceforge.net wrote:
Hello Dennis. I am closing this issue as invalid because it is not an opencsv issue. The issue is not because of the eol character at the end. The issue is that the listings2.csv file is an utf-8 file with an BOM character as the first character. The BOM character, if not removed, becomes part of the name of the first column causing a mismatch between the name of the column in the csv file and the name of the column configured in the java code.
The first clue is when I ran the files command and saw that the file character sets were different.
file -I listing*.csv
listings.csv: text/plain; charset=us-ascii
listings2.csv: text/csv; charset=utf-8
The second clue was using the ls -l command and saw a two character size difference - not a one character difference if the only difference really was the eol character at the end.
ls -l listing*.csv
-rw-r--r--@ 1 sconway staff 382 Jan 10 18:08 listings.csv
-rw-r--r--@ 1 sconway staff 384 Jan 10 18:08 listings2.csv
at that point I did a hexdump command of the two and saw that there was a difference starting at the first character:
hexdump -n10 -C listings2.csv
00000000 ef bb bf 49 74 65 6d 20 6e 75 |...Item nu|
0000000a
hexdump -n10 -C listings.csv
00000000 49 74 65 6d 20 6e 75 6d 62 65 |Item numbe|
0000000a
that ef bb bf you see at the start of listing2.csv is the BOM character.
When dealing with UTF files you need to use the apache-commons commons-io BOMInputStream. I have an example of this in the unit test code in BomHandlingTest.
import com.opencsv.bean.CsvToBeanBuilder;
import org.apache.commons.io.ByteOrderMark;
import org.apache.commons.io.input.BOMInputStream;
import org.junit.jupiter.api.DisplayName;
import org.junit.jupiter.api.Test;
import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.file.Paths;
import java.util.List;
import static org.junit.jupiter.api.Assertions.assertEquals;
public class BomHandlingTest {
}
Basically I just want opencsv to focus on csv files not handing the vulgarities of all the different character sets mankind has invented.
[bugs:#266] Parse drops first column data when input is linux eol terminated
Status: closed-invalid
Group: v1.0 (example)
Created: Sun Jan 11, 2026 01:03 AM UTC by Dennis Cook
Last Updated: Sun Jan 11, 2026 01:03 AM UTC
Owner: Scott Conway
Attachments:
- BindByNameProject.zip (7.5 kB; application/x-zip-compressed)
I have a problem with reading CSV to POJO when mapping with CsvBindByName. In the project I have two input files that have identical data but the EOL character is different. One file uses DOS EOL crlf. A second file use Linux EOL lf only.
The first column of each file containing a string that reparents the item number. When the parse step completes the data for that column is not included in the resulting POJO when the input file lines are Linux EOL terminated.
The attached maven project was created to use Java version 25 and the opencsv 5.12.0. Run the main function, the good and bad resulting POJOs are shown in the console.
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opencsv/bugs/266/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
Related
Bugs:
#266Hello Dennis - my pleasure. And honestly I did not know either that UTF
sometimes had a BOM character and it would have taken me a long time to
find out if the first person with this issue opened a bug stating opencsv
should ignore BOM characters if it is in a UTF file. Hence my response
about opencsv handling csv not the vulgarities of different character sets.
And same here about the new tricks - the more I learn the more I find out
there is much more to learn.
:)
On Mon, Jan 12, 2026 at 1:30 PM Dennis Cook dj_cook@users.sourceforge.net
wrote:
--
Scott Conway
scott.conway@gmail.com
http://www.conwayfamily.name
Related
Bugs:
#266Hello Dennis. I am closing this issue as invalid because it is not an opencsv issue. The issue is not because of the eol character at the end. The issue is that the listings2.csv file is an utf-8 file with an BOM character as the first character. The BOM character, if not removed, becomes part of the name of the first column causing a mismatch between the name of the column in the csv file and the name of the column configured in the java code.
The first clue is when I ran the files command and saw that the file character sets were different.
The second clue was using the ls -l command and saw a two character size difference - not a one character difference if the only difference really was the eol character at the end.
at that point I did a hexdump command of the two and saw that there was a difference starting at the first character:
that ef bb bf you see at the start of listing2.csv is the BOM character.
When dealing with UTF files you need to use the apache-commons commons-io BOMInputStream. I have an example of this in the unit test code in BomHandlingTest.
Basically I just want opencsv to focus on csv files not handing the vulgarities of all the different character sets mankind has invented.