Can trailing commas be stripped from rows?
Brought to you by:
aruckerjones,
sconway
A CSV with trailing commas like this:
name, phone
joe, 123-456-7890,
bob, 333-555-6666,
processed like this:
CSVReaderHeaderAware r = new CSVReaderHeaderAware(reader);
Map<string, string=""> values = r.readMap();</string,>
will throw this exception:
java.io.IOException: Error on record number 2: The number of data elements is not the same as the number of header elements
For now I'm stripping commas from input files using sed:
find . -type f -exec sed -i 's/,\r$//' {} \;
Is there some easy way to tell OpenCSV to ignore trailing commas?
Tricky, because that's basically an invalid CSV file. If you know you will always have single-line records, you could derive a class from CSVReader, override getNextLine() to call super.getNextLine(), then cut off the trailing comma, and of course, pass your new reader into opencsv to use in parsing.
The next option, if you can't guarantee single-line records, is to derive a class from CSVParser and override parseLine() to call super then snip off the last (empty) entry in the array returned before passing that back. Then send your parser in to opencsv.
All untested, of course, and Scott may have better suggestions.
These are all single line records. Yes, it's invalid CSV but unfortunately nothing i can do there.
Ok I think that worked. Thanks.
Do you want a patch submitted for CSVReaderBuilder allowing :
.withTrimTrailingComma(true)
... or probably no iterest in hacking around broken files like that?
Scott wrote the readers and parsers to make individual modifications for special circumstances like this easy. I would not see this as a feature that would generally benefit our user community. Thanks for the offer, though.
Hello Andrew M.
Sorry but I have not had much time the last month to get online to check the tickets (THANK YOU SO MUCH ANDREW J.!!).
For right now your preprocessor is the way to go. Earlier I added validators and processors to opencsv which you can read more at http://opencsv.sourceforge.net/#processors_and_validators
I added a LineValidator because there were several misfiled defects and support requests to handle issues where the number of quotes on a single line did not match. And even though for this particular use case there was a very simple existing solution (multiLineLimit in CSVReader) it made me think that there could be others in a similar situation where they wanted to validate just a single line.
But I did not do a LineProcessor because I thought that was too dangerous. And started with the RowProcessor instead.
For now I think it is still too dangerous but I will keep an eye out on the Feature/Support requests. If enough interest builds up it may be worth considering adding the LineProcessor in place but put in a whole lot of warnings (only use if you are absolutely sure of the contents and structure of your file)!
Just in case anyone circles back to this you can setup a CSVFormat removing trailing commas:
Then split each record into a map. Tried with: