unsure if supercsv can ingest this csv file easily

oggie
2013-01-03
2013-02-18
  • oggie
    oggie
    2013-01-03

    I have a csv file that I don't have control over, so I can't change the layout and have to deal with it as is. The 1st line is a comment. The 2nd line contains the Ids. So the data is actually vertical instead of horizontal.

    Here's the csv:

    blah blah - this is the comment line
    2012/11/04,01:00,26,Main Feeder,Tennis Court,Golf Barn,Pro Shop
    2012/11/04,01:00,39,'','','',''
    2012/11/04,01:00,43,1344120600:292048,1344622200:69944,1344638400:27176,1345847400:9724
    2012/11/04,01:55,28,0.0,0.0,0.0,0.0
    2012/11/04,01:55,29,7.0,1.0,3.0,2.0
    2012/11/04,01:55,30,0.0,0.0,0.0,0.0
    2012/11/04,01:55,31,2.0,0.0,1.0,0.0

    So in the above example, we have 4 devices, main feeder, tennis court, golf barn and pro shop. Each column underneath it belongs to it. So Main Feeder would have '',1344120600:292048, 0.0,7.0, 0.0 and 2.0 for values.

    If I wanted a pojo, it would something be like this (not necessarily a hashmap though. It's just there for representation):

    String ID = "Main Feeder";
    HashMap valuesHashMap = new HashMap();
    valuesHashMap.put(39, "");
    valuesHashMap.put(43, "1344120600:292048");
    valuesHashMap.put(28, "0.0");
    valuesHashMap.put(29, "7.0");
    valuesHashMap.put(30, "0.0");
    valuesHashMap.put(31, "2.0");

    I would then repeat that same sequence for tennis court, gold barn and pro shop. I could also have an unknown # of columns. So I may only have main feeder, and the other 3 would be absent, or I could have more than 4.

    I hope I made sense and that superCSV can actually do this for me...

    Thanks!

     
  • James Bassett
    James Bassett
    2013-01-03

    Hi,

    That sure is an interesting scenario (i.e. not your typical csv!) but I think the following will help.

    You can achieve this many ways, but the easiest is to write your own CellProcessor that records the values for the column.

    import java.util.ArrayList;
    import java.util.List;
    
    import org.supercsv.cellprocessor.CellProcessorAdaptor;
    import org.supercsv.cellprocessor.ift.CellProcessor;
    import org.supercsv.util.CsvContext;
    
    public class Recorder extends CellProcessorAdaptor {
    
        private List<Object> values = new ArrayList<Object>();
    
        public Recorder() {
            super();
        }
    
        public Recorder(CellProcessor next) {
            super(next);
        }
    
        @Override
        public Object execute(Object value, CsvContext context) {
            values.add(value);
            return next.execute(value, context);
        }
    
        public List<Object> getValues() {
            return values;
        }
    
    }
    

    You can then read the CSV file and get the columns values from the appropriate processor.

    package example;
    
    import java.io.FileReader;
    import java.io.IOException;
    
    import org.supercsv.cellprocessor.ift.CellProcessor;
    import org.supercsv.io.CsvListReader;
    import org.supercsv.io.ICsvListReader;
    import org.supercsv.prefs.CsvPreference;
    
    public class RecordColumns {
    
        public static void main(String[] args) throws IOException {
            ICsvListReader listReader = null;
            try {
                listReader = new CsvListReader(new FileReader("test.csv"),
                        CsvPreference.STANDARD_PREFERENCE);
                listReader.getHeader(false); // skip comment line
    
                String[] header = listReader.getHeader(false);
    
                // assemble processors dynamically (variable no. of cols)
                CellProcessor[] processors = new CellProcessor[header.length];
                for (int i = 0; i < processors.length; i++) {
                    processors[i] = new Recorder();
                }
    
                while (listReader.read(processors) != null) {
                    // do nothing - just let recorders do the work
                }
    
                for (int i = 0; i < processors.length; i++) {
                    System.out.println(header[i] + " = "
                            + ((Recorder) processors[i]).getValues());
                }
    
            } finally {
                listReader.close();
            }
        }
    
    }
    

    This will give you the following results:

    2012/11/04 = [2012/11/04, 2012/11/04, 2012/11/04, 2012/11/04, 2012/11/04, 2012/11/04]
    01:00 = [01:00, 01:00, 01:55, 01:55, 01:55, 01:55]
    26 = [39, 43, 28, 29, 30, 31]
    Main Feeder = ['', 1344120600:292048, 0.0, 7.0, 0.0, 2.0]
    Tennis Court = ['', 1344622200:69944, 0.0, 1.0, 0.0, 0.0]
    Golf Barn = ['', 1344638400:27176, 0.0, 3.0, 0.0, 1.0]
    Pro Shop = ['', 1345847400:9724, 0.0, 2.0, 0.0, 0.0]
    
     
  • oggie
    oggie
    2013-01-04

    Thanks for the quick response and full example. I wasn't expecting that much detail in such a quick turnaround.

    I'll try it out tomorrow but it sure looks like what I need!

    Thanks!

     
  • James Bassett
    James Bassett
    2013-01-04

    Great :) Good luck!

     
  • oggie
    oggie
    2013-01-11

    I was able to implement what you posted and it worked great. I don't think I even had to debug it either.

    Thanks so much. This is a great tool and saved me a bunch of time.

     
  • James Bassett
    James Bassett
    2013-01-11

    That's great to hear oggie. If you have time for a review we'd really appreciate it.

    Happy coding :)

     
  • oggie
    oggie
    2013-01-23

    Well, it looks like I spoke too soon. After working with the incoming csv files, it turns out there's an irregularity in them I was unaware of.

    There exists the possibility that some rows may not have the same number of columns. This is very frustrating and not something I can change. Here is an example:

    2013/01/01,09:15,26,Main Feed,Elevators,Fire Alarm,Roof Top,300 KVA Trans,Heating Panel,MCC 1,MCC 2
    2013/01/01,09:15,39,'','','','','','','',''
    2013/01/01,09:00,3,20,0,FTP OK
    2013/01/01,09:00,3,1,0,Config file not found
    2013/01/01,09:01,15,1.000000,0.000000
    2013/01/01,09:01,31,1768,15,2,17,943,1,224,547
    2013/01/01,09:01,1,1531,7,1,7,858,0,189,429
    2013/01/01,09:02,15,0.000000,0.000000
    2013/01/01,09:02,31,1813,35,5,16,976,2,224,548
    2013/01/01,09:02,1,1581,20,1,8,894,0,190,428
    2013/01/01,09:03,15,1.000000,0.000000

    You can see that the last line only has 5 records instead of 11 for the others. So it appears that we can get rows that will have less columns than the header. It appears that it's only for certain record types (column 3) where it's 3 or 15 as far as I can tell.

    I was thinking maybe I could run the CSV through a pre-processor and pad the lines with the correct number of columns.

    Can SuperCSV handle this? Right now it's throwing an exception of course.

     
    Last edit: oggie 2013-01-23
  • James Bassett
    James Bassett
    2013-01-23

    You might want to check out this answer on StackOverflow:
    http://stackoverflow.com/a/11698684/1068649

    In a nutshell: you can dynamically choose what cell processor array to use by reading with CsvListReader at the same time (so you know how many columns there are for that row before calling read() on your CsvMapReader).

     
    Last edit: James Bassett 2013-01-23
  • oggie
    oggie
    2013-02-15

    So far so good. Everything works well.

    We've also seen times where there isn't a header.... I know... out of my control.

    So we need to dynamically handle it so that in a situation where the header doesn't exist and we grab the first line, thinking it's a header, but it isn't. So the 1st line is actually a real record. Is there anyway to add that row back in?

    We can tell it's not a header based on the 1st character being a single quote.

    Is there any easy way around this?

    We don't actually read files since it's coming in via an http post. So we do this:

    try (InputStream inputStream = dataFileInputStream;
                    InputStreamReader streamReader = new InputStreamReader(inputStream);
                    BufferedReader bufferedReader = new BufferedReader(streamReader);
                    ICsvListReader listReader = new CsvListReader(bufferedReader, CsvPreference.STANDARD_PREFERENCE)) {
                String[] header = listReader.getHeader(false);
    
                CellProcessor[] processors = new CellProcessor[header.length];
                for (int i = 0; i < processors.length; i++) {
                    processors[i] = new Recorder();
                }
                while (listReader.read(processors) != null) {
                    // Do nothing - just let recorders do the work
                }
                for (int i = 0; i < processors.length; i++) {
                    logger.info(header[i] + " = " + ((Recorder) processors[i]).getValues());
                }
    
     
  • oggie
    oggie
    2013-02-17

    I did some thinking on this, and the only way I can see of doing it is to read the file and test for the header. If the header is there, process as normal. If the header is missing, create a second file with a dummy header and process the new file.

    Seems like a lot of extra processing though.

     
  • James Bassett
    James Bassett
    2013-02-17

    If I understand this correctly, couldn't you just cheat and check at the end whether the first line was a header or not? If it wasn't a header, then just add each column from the header to the start of each Recorder's list.

    while (listReader.read(processors) != null) {
        // Do nothing - just let recorders do the work
    }
    
    boolean firstLineNotHeader; // TODO based on the first char of the header??
    for (int i = 0; i < processors.length; i++) {
        List<Object> values = ((Recorder) processors[i]).getValues();
        if (firstLineNotHeader){
            // add the header value to the start of the list
            values.add(0, header[i]);
        }
        logger.info(header[i] + " = " + values);
    }
    
     
  • oggie
    oggie
    2013-02-18

    Thanks for the tip. We only get data for certain columns, but your tip was what I needed. It is a bit of a cheat like you said, but it works:

    if (header[0].startsWith("'")) {
        useHeader = true;
    }
    ....
    // Get the date readings
    Recorder dateReadingsRecorder = (Recorder) processors[0];
    List<String> dateReadings = (List<String>) (List<?>) dateReadingsRecorder.getValues();
    if (useHeader) {
       dateReadings.add(0, header[0]);
    }
    ....
    while (....) {
    ....
    int columnNumberInt = Integer.parseInt(columnNumber);
    // Get the meter readings for that column
    Recorder meterReadingsRecorder = (Recorder) processors[columnNumberInt];
    List<String> meterReadings = (List<String>) (List<?>) meterReadingsRecorder.getValues();
    if (useHeader) {
      meterReadings.add(0, header[columnNumberInt]);
    }
    ...