I have a csv file that I don't have control over, so I can't change the layout and have to deal with it as is. The 1st line is a comment. The 2nd line contains the Ids. So the data is actually vertical instead of horizontal.
Here's the csv:
blah blah - this is the comment line
2012/11/04,01:00,26,Main Feeder,Tennis Court,Golf Barn,Pro Shop
2012/11/04,01:00,39,'','','',''
2012/11/04,01:00,43,1344120600:292048,1344622200:69944,1344638400:27176,1345847400:9724
2012/11/04,01:55,28,0.0,0.0,0.0,0.0
2012/11/04,01:55,29,7.0,1.0,3.0,2.0
2012/11/04,01:55,30,0.0,0.0,0.0,0.0
2012/11/04,01:55,31,2.0,0.0,1.0,0.0
So in the above example, we have 4 devices, main feeder, tennis court, golf barn and pro shop. Each column underneath it belongs to it. So Main Feeder would have '',1344120600:292048, 0.0,7.0, 0.0 and 2.0 for values.
If I wanted a pojo, it would something be like this (not necessarily a hashmap though. It's just there for representation):
I would then repeat that same sequence for tennis court, gold barn and pro shop. I could also have an unknown # of columns. So I may only have main feeder, and the other 3 would be absent, or I could have more than 4.
I hope I made sense and that superCSV can actually do this for me...
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, it looks like I spoke too soon. After working with the incoming csv files, it turns out there's an irregularity in them I was unaware of.
There exists the possibility that some rows may not have the same number of columns. This is very frustrating and not something I can change. Here is an example:
2013/01/01,09:15,26,Main Feed,Elevators,Fire Alarm,Roof Top,300 KVA Trans,Heating Panel,MCC 1,MCC 2
2013/01/01,09:15,39,'','','','','','','',''
2013/01/01,09:00,3,20,0,FTP OK
2013/01/01,09:00,3,1,0,Config file not found
2013/01/01,09:01,15,1.000000,0.000000
2013/01/01,09:01,31,1768,15,2,17,943,1,224,547
2013/01/01,09:01,1,1531,7,1,7,858,0,189,429
2013/01/01,09:02,15,0.000000,0.000000
2013/01/01,09:02,31,1813,35,5,16,976,2,224,548
2013/01/01,09:02,1,1581,20,1,8,894,0,190,428
2013/01/01,09:03,15,1.000000,0.000000
You can see that the last line only has 5 records instead of 11 for the others. So it appears that we can get rows that will have less columns than the header. It appears that it's only for certain record types (column 3) where it's 3 or 15 as far as I can tell.
I was thinking maybe I could run the CSV through a pre-processor and pad the lines with the correct number of columns.
Can SuperCSV handle this? Right now it's throwing an exception of course.
Last edit: oggie 2013-01-23
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In a nutshell: you can dynamically choose what cell processor array to use by reading with CsvListReader at the same time (so you know how many columns there are for that row before calling read() on your CsvMapReader).
Last edit: James Bassett 2013-01-23
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We've also seen times where there isn't a header.... I know... out of my control.
So we need to dynamically handle it so that in a situation where the header doesn't exist and we grab the first line, thinking it's a header, but it isn't. So the 1st line is actually a real record. Is there anyway to add that row back in?
We can tell it's not a header based on the 1st character being a single quote.
Is there any easy way around this?
We don't actually read files since it's coming in via an http post. So we do this:
try (InputStream inputStream = dataFileInputStream;
InputStreamReader streamReader = new InputStreamReader(inputStream);
BufferedReader bufferedReader = new BufferedReader(streamReader);
ICsvListReader listReader = new CsvListReader(bufferedReader, CsvPreference.STANDARD_PREFERENCE)) {
String[] header = listReader.getHeader(false);
CellProcessor[] processors = new CellProcessor[header.length];
for (int i = 0; i < processors.length; i++) {
processors[i] = new Recorder();
}
while (listReader.read(processors) != null) {
// Do nothing - just let recorders do the work
}
for (int i = 0; i < processors.length; i++) {
logger.info(header[i] + " = " + ((Recorder) processors[i]).getValues());
}
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I did some thinking on this, and the only way I can see of doing it is to read the file and test for the header. If the header is there, process as normal. If the header is missing, create a second file with a dummy header and process the new file.
Seems like a lot of extra processing though.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If I understand this correctly, couldn't you just cheat and check at the end whether the first line was a header or not? If it wasn't a header, then just add each column from the header to the start of each Recorder's list.
while(listReader.read(processors)!=null){// Do nothing - just let recorders do the work}booleanfirstLineNotHeader;// TODO based on the first char of the header??for(inti=0;i<processors.length;i++){List<Object>values=((Recorder)processors[i]).getValues();if(firstLineNotHeader){// add the header value to the start of the listvalues.add(0,header[i]);}logger.info(header[i]+" = "+values);}
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have a csv file that I don't have control over, so I can't change the layout and have to deal with it as is. The 1st line is a comment. The 2nd line contains the Ids. So the data is actually vertical instead of horizontal.
Here's the csv:
So in the above example, we have 4 devices, main feeder, tennis court, golf barn and pro shop. Each column underneath it belongs to it. So Main Feeder would have '',1344120600:292048, 0.0,7.0, 0.0 and 2.0 for values.
If I wanted a pojo, it would something be like this (not necessarily a hashmap though. It's just there for representation):
I would then repeat that same sequence for tennis court, gold barn and pro shop. I could also have an unknown # of columns. So I may only have main feeder, and the other 3 would be absent, or I could have more than 4.
I hope I made sense and that superCSV can actually do this for me...
Thanks!
Hi,
That sure is an interesting scenario (i.e. not your typical csv!) but I think the following will help.
You can achieve this many ways, but the easiest is to write your own CellProcessor that records the values for the column.
You can then read the CSV file and get the columns values from the appropriate processor.
This will give you the following results:
Thanks for the quick response and full example. I wasn't expecting that much detail in such a quick turnaround.
I'll try it out tomorrow but it sure looks like what I need!
Thanks!
Great :) Good luck!
I was able to implement what you posted and it worked great. I don't think I even had to debug it either.
Thanks so much. This is a great tool and saved me a bunch of time.
That's great to hear oggie. If you have time for a review we'd really appreciate it.
Happy coding :)
Well, it looks like I spoke too soon. After working with the incoming csv files, it turns out there's an irregularity in them I was unaware of.
There exists the possibility that some rows may not have the same number of columns. This is very frustrating and not something I can change. Here is an example:
You can see that the last line only has 5 records instead of 11 for the others. So it appears that we can get rows that will have less columns than the header. It appears that it's only for certain record types (column 3) where it's 3 or 15 as far as I can tell.
I was thinking maybe I could run the CSV through a pre-processor and pad the lines with the correct number of columns.
Can SuperCSV handle this? Right now it's throwing an exception of course.
Last edit: oggie 2013-01-23
You might want to check out this answer on StackOverflow:
http://stackoverflow.com/a/11698684/1068649
In a nutshell: you can dynamically choose what cell processor array to use by reading with CsvListReader at the same time (so you know how many columns there are for that row before calling read() on your CsvMapReader).
Last edit: James Bassett 2013-01-23
So far so good. Everything works well.
We've also seen times where there isn't a header.... I know... out of my control.
So we need to dynamically handle it so that in a situation where the header doesn't exist and we grab the first line, thinking it's a header, but it isn't. So the 1st line is actually a real record. Is there anyway to add that row back in?
We can tell it's not a header based on the 1st character being a single quote.
Is there any easy way around this?
We don't actually read files since it's coming in via an http post. So we do this:
I did some thinking on this, and the only way I can see of doing it is to read the file and test for the header. If the header is there, process as normal. If the header is missing, create a second file with a dummy header and process the new file.
Seems like a lot of extra processing though.
If I understand this correctly, couldn't you just cheat and check at the end whether the first line was a header or not? If it wasn't a header, then just add each column from the header to the start of each Recorder's list.
Thanks for the tip. We only get data for certain columns, but your tip was what I needed. It is a bit of a cheat like you said, but it works:
~~~~~
if (header[0].startsWith("'")) {
useHeader = true;
}
....
// Get the date readings
Recorder dateReadingsRecorder = (Recorder) processors[0];
List<String> dateReadings = (List<String>) (List<?>) dateReadingsRecorder.getValues();
if (useHeader) {
dateReadings.add(0, header[0]);
}
....
while (....) {
....
int columnNumberInt = Integer.parseInt(columnNumber);
// Get the meter readings for that column
Recorder meterReadingsRecorder = (Recorder) processors[columnNumberInt];
List<String> meterReadings = (List<String>) (List<?>) meterReadingsRecorder.getValues();
if (useHeader) {
meterReadings.add(0, header[columnNumberInt]);
}
...
~~~~