Support BOMInputStream in CSVToBeanBuilder
Brought to you by:
aruckerjones,
sconway
**Problem: ** OpenCSV fails to read in CSV files that are in the UTF-8 BOM format. This can create problems because this is a default format when an Excel file is saved as a CSV UTF-8. OpenCSV, in this case, reads in the annnotated fields as 0's. It is well-documented that Java does not handle BOM properly with its Reader class.
**Request: ** Add a constructor to the CSVToBeanBuilder class that accepts apache-commons-io BOMInputStream object. This object eliminates any BOM character if it is present.
My implementation locally has been as follows:
Add to: src/main/java/com/opencsv/bean/CsvToBeanBuilder.java
Constructor:
public CsvToBeanBuilder(BOMInputStream bomInputStream)
{
if(bomInputStream == null) {
throw new IllegalArgumentException(ResourceBundle
.getBundle(ICSVParser.DEFAULT_BUNDLE_NAME) // Must be default locale, because we don't have anything else yet
.getString("reader.null"));
}
this.reader = new InputStreamReader(bomInputStream);
this.csvReader = null;
}
Add dependency:
<!-- https://mvnrepository.com/artifact/commons-io/commons-io -->
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.11.0</version>
</dependency>
Attached is an IntelliJ project replicating the issue.
Why not just create a reader from the BOMInputStream?
BufferedReader fileReader = new BufferedReader(
new InputStreamReader(new BOMInputStream(new FileInputStream(inputFile.toFile())), "UTF-8"));
Will try it myself when I get a chance but it has been a very busy year thus far - which is also my way of apologizing for taking so long to get back to everyone :)
Sorry it took so long but I finally had time to open up your zip file and look at your example and see that my suggestion WAS your third test.
We don't need to add complexity to opencsv to handle multiple different file types when java handles that for us with its different types of input streams - which we handle.
So I am closing this as won't fix as we already handle the file type via input stream.