Hello. I need help about how to read a csv file with different row formats using Dozer.
Here is an example of what my csv is:
;;;;;;;;;;;;;
Record;5049;5044;Bari;;;;;Cagliari;;;;;
;;27/09/2012;;;;;;;;;;;
1;;07/01/1939;58;22;47;49;69; - ; - ; - ; - ; - ;
2;;14/01/1939;18;77;33;62;19; - ; - ; - ; - ; - ;
3;;08/07/1939;23;46;2;59;1;52;62;77;49;87;
4;;15/07/1939;27;60;89;80;10;26;62;39;27;32;
I can't manually edit the csv file because it is en export from another software and I have to import exactly like it is.
As you can see, the first row is empty, the second one is something similar to a header row, but have some informations I need to read and store.
About the second row, I need to store the 2 numbers and the strings "Bari" and "Cagliari" (those strings could change in another export).
The third row is nearly empty and I don't need to store any information about it.
From the fourth row ahead, I need to store informations differently from what I did for second row.
I started trying to create 2 mapping array, 2 classes representing the map and 2 processors array, but I noticed I can't change the mapping if I already started to read the file. The second call to beanReader.configureBeanMapping() throws an exception.
I thought to try to read the file 2 times. The first time I read only the second row and close the reader. The second time I go to read the rows from 4th ahead.
The problem is that I don't know how to "pass" 2 rows without have to parse or map them just to get to the fourth row.
However this solution sucks :D
Another problem is caused by the " - " strings.
My processor array uses "new Optional(new ParseInt())" and the class representing the maps has 5 Integer attributes.
When Super CSV have to parse a field with " - ", it returns:
org.dozer.converters.ConversionException: org.apache.commons.beanutils.ConversionException: Error converting from 'String' to 'Integer' For input string: "-".
I even tryed to write my own processor, but nothing changed. It seems that the exception fires before the processor is used. It check the field's types in my class and try to cast the string to it. If i change my class' fields from Integer to String, everything works.
I hope all of that is understandable. If not, I will try to explain again.
Last edit: Criso 2013-02-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for all the info - yes it's understandable :)
Just to keep things simple, here's an example that uses CsvListReader (I hope you get this via email, because I know the format is going to be awful after I submit this!).
Essentially, you can use getHeader() to skip over rows you don't want.
packageqdtmr.util.supercsv;importjava.io.FileReader;importjava.io.IOException;importjava.util.List;importorg.supercsv.cellprocessor.Optional;importorg.supercsv.cellprocessor.ParseDate;importorg.supercsv.cellprocessor.ParseInt;importorg.supercsv.cellprocessor.Token;importorg.supercsv.cellprocessor.ift.CellProcessor;importorg.supercsv.io.CsvListReader;importorg.supercsv.io.ICsvListReader;importorg.supercsv.prefs.CsvPreference;publicclassDifferentRowFormats{publicstaticvoidmain(String[]args)throwsIOException{ICsvListReaderreader=null;try{reader=newCsvListReader(newFileReader("input.csv"),CsvPreference.EXCEL_NORTH_EUROPE_PREFERENCE);reader.getHeader(false);// skip first line (empty)// processors for reading second linefinalCellProcessor[]processors=newCellProcessor[14];// only care about processing these 2 columnsprocessors[1]=newParseInt();processors[2]=newParseInt();List<Object>secondLine=reader.read(processors);System.out.println("Second line: "+secondLine);reader.getHeader(false);// skip third line (not interested!)// a lot of the values appear to be ints, but can also be empty/dashfinalCellProcessoroptionalParseInt=newToken(" - ",null,newOptional(newParseInt()));// processors for reading the data rowsfinalCellProcessor[]dataProcessors=newCellProcessor[]{newParseInt(),null,newParseDate("dd/MM/yyyy"),newParseInt(),newParseInt(),newParseInt(),newParseInt(),newParseInt(),optionalParseInt,optionalParseInt,optionalParseInt,optionalParseInt,optionalParseInt,null};List<Object>row;while((row=reader.read(dataProcessors))!=null){System.out.println("Data row: "+row);}}finally{if(reader!=null){reader.close();}}}}
You can see how I've skipped over the 1st and 3rd rows using getHeader(), and I've defined 2 different processor arrays - one for the 2nd line, and one for the data rows. I've just made guesses for which processors to use - I don't know what the actual data is (something to do with Italy??)!
I'd suggest only using the Dozer reader if you need indexed or deep mapping - if you just want to map your CSV to a bean then go with CsvBeanReader. It's faster and doesn't need any other dependencies.
If you do use the Dozer reader - you'll need to configure the mapping before reading (you'll need to call configureBeanMapping() twice - once for each of your 2 classes). Then when reading the second line pass the first class into read(), and when reading the data rows pass the second class into read() (using the appropriate cell processor array).
I hope this helps.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, that project has something to do with Italy. It's about "Lotto", an Italian state lottery.
Two times a week, 5 numbers (from 1 to 90) are drawn for 10 lottery wheels (that have the names of 10 italian cities) plus 5 numbers on a "national" wheel (added recently).
From the csv I have to generate an instance of the class "Extraction" for each record in the csv (4th row ahead).
If I correctly understood, with deep mapping and indexed mapping, I can directly create that structure from the csv (the reason why I am using Dozer).
I will try the "new Token()" code to solve my problem with casts. I simplified a bit the example I reported to you. I use Byte, not Integer and when the csv do not contain a number, it contains "funny_character - funny_character" (eclipse prints "� - �").
I attached the real CSV if you are interested in it.
I see now, yes - the Dozer reader is probably a good choice then. Glad to hear you got it working. Don't forget to 'accept' the answers on StackOverflow too :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello. I need help about how to read a csv file with different row formats using Dozer.
Here is an example of what my csv is:
;;;;;;;;;;;;;
Record;5049;5044;Bari;;;;;Cagliari;;;;;
;;27/09/2012;;;;;;;;;;;
1;;07/01/1939;58;22;47;49;69; - ; - ; - ; - ; - ;
2;;14/01/1939;18;77;33;62;19; - ; - ; - ; - ; - ;
3;;08/07/1939;23;46;2;59;1;52;62;77;49;87;
4;;15/07/1939;27;60;89;80;10;26;62;39;27;32;
I can't manually edit the csv file because it is en export from another software and I have to import exactly like it is.
As you can see, the first row is empty, the second one is something similar to a header row, but have some informations I need to read and store.
About the second row, I need to store the 2 numbers and the strings "Bari" and "Cagliari" (those strings could change in another export).
The third row is nearly empty and I don't need to store any information about it.
From the fourth row ahead, I need to store informations differently from what I did for second row.
I started trying to create 2 mapping array, 2 classes representing the map and 2 processors array, but I noticed I can't change the mapping if I already started to read the file. The second call to beanReader.configureBeanMapping() throws an exception.
I thought to try to read the file 2 times. The first time I read only the second row and close the reader. The second time I go to read the rows from 4th ahead.
The problem is that I don't know how to "pass" 2 rows without have to parse or map them just to get to the fourth row.
However this solution sucks :D
Another problem is caused by the " - " strings.
My processor array uses "new Optional(new ParseInt())" and the class representing the maps has 5 Integer attributes.
When Super CSV have to parse a field with " - ", it returns:
org.dozer.converters.ConversionException: org.apache.commons.beanutils.ConversionException: Error converting from 'String' to 'Integer' For input string: "-".
I even tryed to write my own processor, but nothing changed. It seems that the exception fires before the processor is used. It check the field's types in my class and try to cast the string to it. If i change my class' fields from Integer to String, everything works.
I hope all of that is understandable. If not, I will try to explain again.
Last edit: Criso 2013-02-18
Hi Andrea,
Thanks for all the info - yes it's understandable :)
Just to keep things simple, here's an example that uses CsvListReader (I hope you get this via email, because I know the format is going to be awful after I submit this!).
Essentially, you can use
getHeader()
to skip over rows you don't want.You can see how I've skipped over the 1st and 3rd rows using
getHeader()
, and I've defined 2 different processor arrays - one for the 2nd line, and one for the data rows. I've just made guesses for which processors to use - I don't know what the actual data is (something to do with Italy??)!I'd suggest only using the Dozer reader if you need indexed or deep mapping - if you just want to map your CSV to a bean then go with CsvBeanReader. It's faster and doesn't need any other dependencies.
If you do use the Dozer reader - you'll need to configure the mapping before reading (you'll need to call
configureBeanMapping()
twice - once for each of your 2 classes). Then when reading the second line pass the first class intoread()
, and when reading the data rows pass the second class intoread()
(using the appropriate cell processor array).I hope this helps.
Yes, that project has something to do with Italy. It's about "Lotto", an Italian state lottery.
Two times a week, 5 numbers (from 1 to 90) are drawn for 10 lottery wheels (that have the names of 10 italian cities) plus 5 numbers on a "national" wheel (added recently).
From the csv I have to generate an instance of the class "Extraction" for each record in the csv (4th row ahead).
Here an example of the class structure:
If I correctly understood, with deep mapping and indexed mapping, I can directly create that structure from the csv (the reason why I am using Dozer).
I will try the "new Token()" code to solve my problem with casts. I simplified a bit the example I reported to you. I use Byte, not Integer and when the csv do not contain a number, it contains "funny_character - funny_character" (eclipse prints "� - �").
I attached the real CSV if you are interested in it.
Thank you for your reply, you are very kind :)
Last edit: Criso 2013-02-19
Ok, I did everything I needed.
My new processor is:
Thank you so much for your help and patience.
Andrea
I see now, yes - the Dozer reader is probably a good choice then. Glad to hear you got it working. Don't forget to 'accept' the answers on StackOverflow too :)