Possible to get Headers of read CSV-File?

Brought to you by: aruckerjones, sconway

#56 Possible to get Headers of read CSV-File?

Milestone: v1.0 (example)

Status: closed

Owner: Andrew Rucker Jones

Labels: None

Priority: 5

Updated: 2018-06-25

Created: 2018-06-15

Creator: Patrick Siegel

Private: No

Hey there again,
i would like to ask if it's possible to get the headers of a CSV-File to do a custom file validation based on the received headers. Depending on configurations of our software some columns in the CSV-File switch between mandatory and optional and we would like to use the annotation based file reading method and get the read headers to validate them before reading the actual data of the file.

thanks for your help
regards
Patrick

Discussion

Scott Conway - 2018-06-17

I can answer this question <bg>. </bg>

It is the "before reading the actual data of the file." that presents the challenge because in the CSVToBean the header is read in the parse command.

The easiest way I can think of is just create a separate CSVReader beforehand and just read the first line. That will give you an array of strings that are the headers.

But since you want to do annotation based methods then create a HeaderColumnNameMappingStrategy and call the captureHeader with the new CSVReader you created. That will read the header and determine any missing required fields to boot. Then I believe you can get what you want by calling the generateHeader() which will give you back the header of the file as an array of strings and then call getFieldMap which will give you a map whose key is the field name and value is a ComplexFieldMapEntry. You can then loop through the array and call the get on the map to get the information you want for each field.

Hope that helps.

Scott :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andrew Rucker Jones - 2018-06-24

status: open --> closed

assigned_to: Andrew Rucker Jones
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andrew Rucker Jones - 2018-06-24

Scott's suggestion will probably work just fine -- I haven't thought through it. Unless you're dealing with huge data sets, it might be simpler to declare all the fields you're concerned about optional, read and parse everything, then check for the prescence of the headers that should be mandatory afterward. Or, it might be simpler to create different bean types (possibly with a common ancestor) based on what fields should be mandatory.

Perhaps overriding an mapping strategy and passing it to the CsvToBeanBuilder would also be an option.

I think the bottom line, though, is that the annotations are meant to streamline the entire process as much as possible, so breaking up that process in the middle and doing something the library doesn't expect you to do it going to be unpleasant.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Patrick Siegel - 2018-06-25

Thank you Scott and Andrew for your ideas and tips,

I understand the problem of reading only one line of a file. From reading your answers I have the feeling that getting the headers is a little "overcomplicated". Reading csv files via annotated beans is blazingly fast to implement and I enjoy it so much. I don't want to destroy this experience by having to add further code or workarounds just to get the headers as a string list. Wouldn't it be a solution for you to provide a getHeaders() method on the CSVReader, that can be called after a csv file has been parsed? We have many csv imports in our application and with an easy way of getting the headers we simply could implement a way of checking if a user used the correct file for this specific import without going through 500.000+ "lines" of data where all of them are not as the import expects it to be. Also thinking about it even more checking only the data of the file doesn't fit our needs completely, as we face the problem, that we have some imports where we have to allow missing or "wrong" data and the user has to confirm that he wants to import them anyway. We don't like that at all, but we have to do it this way...

Again thanks for your help and suggestions. Let me know if you have more input on this topic.
cheers
Patrick

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.