opencsv / Feature Requests / #149 A Vote for Row Processing Modifying Field Structure.

Andrew Rucker Jones - 2021-08-24

I'm curious: how are you parsing/using your data? Because if you're just taking the String[] results directly from CSVParser, you could already do this. If you're doing something else, maybe we can find a different solution for you.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frank Nestel - 2021-08-24

Usually I am that low tech guy, doing it all in my code, but this time I tried to be smart and climb on the shoulders of giants, wrote a useful bean and applied opencsv. Was working nicely for a while, until that 3rd party missed the bug in their csv export . If only they had used opencsv for export. Now I am here with an opencsv solution that only works on old files. Do you want to tell me, I missed the API to simply push String[] myself into the opencsv bean machinery? So far I just let opencsv work on the whole file and wonder if I can fix things without losing the bean stuff. For company reasons, I have no access to my code right now, I can post more detail tomorrow.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2021-08-25

http://opencsv.sourceforge.net/#reading_into_beans That will give you an overview.

If you actually do a pull of the source code or download it from the Files section there are several unit tests you can look at for working examples.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frank Nestel - 2021-08-25

Hello Scott, hello Andrew, thank you both for your answers.

The problem is not reading into beans, the problem is reading broken CSVs.
My code is as simple as, as much is done in the annotated Bean.

public static void main(String[] args) throws IOException { // Locale.setDefault(Locale.ENGLISH); // to get Date parsing done. ArrayList<ChimeUsageBean> all = new ArrayList<>(); for(String arg : args) { System.out.println("*** "+arg); try { List<ChimeUsageBean> beans = new CsvToBeanBuilder(new FileReader(arg)) .withType(ChimeUsageBean.class).build().parse(); for(ChimeUsageBean cub : beans) { System.out.println("* "+cub); all.add(cub); } } catch ( Exception ex ) { ex.printStackTrace(); } System.out.println(">>> "+all.size()); } }

I exactly understand, how and where these additional commas slip into these CSVs, so modifying the parsing result as described above, would be the solution. However writing filters to fix the CSVs, is actually like doing all the dirty work in own code and there would be little to no benefit left from using opencsv as a 2nd step after filtering.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Andrew Rucker Jones - 2021-08-25
  
  I agree with you. You're doing things the right way. I've thought on it some to try to come up with some clever bean trick (e.g. performing additional steps in the bean's setters), but I'm not coming up with anything obvious.
  
  The essential problem is that the data are corrupt. There's only so much one can do with corrupt data.
  
  I will then leave this as the initial request: RowProcessors should be able to alter the line structure.
  
  Scott, that puts it back in your court, since the Row Processors are yours. It doesn't look like this request can easily be fulfilled without breaking backward compatibility.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2022-08-28

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2022-08-28

Sorry for taking so long to respond. I was going through all the open tickets and found this. I am closing this as won't fix. The RowProcessor is designed to modify individual Row elements not take and array of size 8 and turn it into an array of size 7.

We are NOT here to interpet corrupt data! There is just too many things to cover in that case. Users adding commas to their data but not encapsulating the data in double quotes is one of them.

Personally I would say in your specific case since you seem to know which field would have the corruption is to write a pre-parser program that would read the user input file and spit out a "cleansed" file where on rows that have eight elements instead of seven you combine back the two rows into one and then in the CSVWriter you enable strictQuotes to force quotes around all fields. Or use the RFC4180Parser in your writer - I believe the effect will be the same. Then you can run that resulting output file into your current application without worry and use the first program to handle all the corrupt possibilities.

A nice separation of concerns.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Frank Nestel - 2022-08-29

Well, Scott, thank you for your answer. Unfortunately it just misses the point. This is not about creating JSONs, I know how to create valid JSONs, but as explained above, I get JSONs from an external company and I now know CSVWriter is not and will never be the tool to handle these files. Somewhat bad investment of my time. But maybe you are right, just don't expect me to be convinced.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

JSON? Who said anything about JSON? Frank are you posting feature requests in the wrong projects again???? <bg> just kidding. No your request was that you have an client sending you a corrupt csv file and you are trying to pase it but in this case the corruption is always consistent... If I am reading your original description correct. The client data pasrses out to eight fields when it should be seven because one of their columns has a comma in it but they did not put quotes around that field. Bad on them!!! </bg>

But as I said you cannot use the row processor to modify the number of rows.   It was designed only to modify the data within the existing rows.

Personally I do not want to take on code that works with corrupt data as there are too many ways a file can be corrupted.    Given that you know how the file is corrupted I would recommend you create a pre-processor application that will take the malformed data and correct it before you create a bean out of it.

Now if you are not working with a CSV file but a JSON object that you are parsing out a line of csv from let me know - that would explain the json references you talked about.     But even then the approach would be the same - parse out the csv line and correct it if it has too many fields by combining the ones you know has the extra comma in it AND putting quotes around those fields so when you parse it again with the csv to bean it will correctly parse it.

Frank Nestel - 2022-08-30

Sry, to many things in parallel, in this case it was all about broken CSV, the broken JSONs were a different one.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

A Vote for Row Processing Modifying Field Structure.

Group

Searches

Help

#149 A Vote for Row Processing Modifying Field Structure.

Discussion