How do you use Super CSV??

2011-11-29
2013-10-02
  • James Bassett

    James Bassett - 2011-11-29

    We'd love to know how you're using Super CSV so we can add useful features/functionality!

    Do you use Super CSV on a Production app server, mobile device or something
    else?

    Or are you pushing the boundaries of what's possible with Super CSV (there's
    been recent discussion about multi-threading and FileLocks for example)?

    Any information you can provide will help us make Super CSV even better. For
    example, we're implementing OSGi support for the upcoming release, due to a
    request from a number of users.

     
  • Stephen Chester

    Stephen Chester - 2012-10-15

    I am hoping to use Super CSV as a helper on a pre-production environment to import data from a source that does not provide an API. I am planning on multi-threading my solution, and would love a pointer to any discussion that has already happened around this!

     
  • James Bassett

    James Bassett - 2012-10-15

    Hi Stephen,

    I think the original post was in response to this feature request, which isn't really helpful to you.

    I'm keen to discuss the best way to use Super CSV with multiple threads - it's an area nobody has really asked too much about, so if we can put together some ideas it will help the next person who comes along.

    Super CSV is not thread-safe. It was written with performance in mind, so it uses instance variables and re-uses objects to increase efficiency. That said, I don't think it's a good idea to read/write CSV from multiple threads anyway (unless you're working with multiple files, then it might make sense - and you'd have to use a separate reader/writer for each file).

    I'd say it's far more likely (I am just guessing though!) in your scenario that the processing of the data, not the CSV parsing, is the area you want to be multithreaded. If this is the case, it should be quite easy to integrate a multithreaded solution with Super CSV.

    I've put together an example application below that reads each line of CSV into a bean, then processes those beans in batches of 3 (of course a real application would use a much larger batch size!). I could have read all the objects into memory and then processed them, but that's generally not a good idea as you can run out of memory with large files. If your files are small, you could ditch the batch processing and simply rely on the thread executor service to manage your multi-threading.

    Code

    package example;
    
    import java.io.StringReader;
    import java.util.ArrayList;
    import java.util.List;
    import java.util.concurrent.Callable;
    import java.util.concurrent.ExecutorService;
    import java.util.concurrent.Executors;
    import java.util.concurrent.Future;
    
    import org.supercsv.cellprocessor.ParseInt;
    import org.supercsv.cellprocessor.constraint.NotNull;
    import org.supercsv.cellprocessor.ift.CellProcessor;
    import org.supercsv.io.CsvBeanReader;
    import org.supercsv.io.ICsvBeanReader;
    import org.supercsv.prefs.CsvPreference;
    
    public class ThreadingExample {
    
      private static final String CSV = 
        "id,time\none,1\ntwo,2\nthree,3\nfour,4\nfive,5\nsix,6\nseven,7";
    
      private static final CellProcessor[] PROCESSORS = 
        { new NotNull(), new ParseInt() };
    
      // number of tasks to execute each batch
      private static final int BATCH_SIZE = 3;
    
      public static void main(String[] args) throws Exception {
    
        ExecutorService executor = null;
        ICsvBeanReader reader = null;
        try {
          reader = new CsvBeanReader(new StringReader(CSV), 
            CsvPreference.STANDARD_PREFERENCE);
    
          final String[] header = reader.getHeader(true);
    
          // our executor service
          executor = Executors.newCachedThreadPool();
    
          // may run out of memory reading then executing all tasks,
          // so process in batches
          List<ExecuteTimedTask> tasksToExecute = 
            new ArrayList<ExecuteTimedTask>();
    
          // read each line of CSV as a TimedTask bean
          TimedTask timedTask;
          while ((timedTask = 
            reader.read(TimedTask.class, header, PROCESSORS)) != null) {
    
            // adds the task to the batch, and executes if necessary
            tasksToExecute.add(new ExecuteTimedTask(timedTask));
            if (tasksToExecute.size() == BATCH_SIZE) {
    
              // blocks CSV reading until all current tasks are done
              for (Future<TimedTask> finishedTask : 
                executor.invokeAll(tasksToExecute)) {
    
                // print out the calculated message
                System.out.println(finishedTask.get().getMessage());
              }
    
              // reset for next run
              tasksToExecute.clear();
            }
          }
    
          // execute any left over tasks (didn't have a full batch last time)
          if (!tasksToExecute.isEmpty()) {
            for (Future<TimedTask> finishedTask : 
              executor.invokeAll(tasksToExecute)) {
              System.out.println(finishedTask.get().getMessage());
            }
          }
    
        } catch (Exception e) {
          e.printStackTrace();
        } finally {
          executor.shutdown();
          reader.close();
        }
    
      }
    
      /**
       * The Callable containing the logic for executing a TimedTask.
       */
      private static class ExecuteTimedTask implements Callable<TimedTask> {
    
        private TimedTask task;
    
        public ExecuteTimedTask(final TimedTask task) {
          this.task = task;
        }
    
        public TimedTask call() throws Exception {
          final long startTime = System.currentTimeMillis();
          System.out.println(String.format("task %s started", task.getId()));
    
          Thread.sleep(task.getTime() * 1000);
    
          System.out.println(String.format("task %s ended", task.getId()));
          final long endTime = System.currentTimeMillis();
    
          // update the message
          task.setMessage(String.format("task %s took %d milliseconds", 
            task.getId(), endTime - startTime));
          return task;
        }
    
      }
    
      /**
       * A task that runs for a specific time period.
       */
      public static class TimedTask {
    
        private String id;
    
        private int time;
    
        private String message;
    
        public String getId() {
          return id;
        }
    
        public void setId(String id) {
          this.id = id;
        }
    
        public int getTime() {
          return time;
        }
    
        public void setTime(int time) {
          this.time = time;
        }
    
        public String getMessage() {
          return message;
        }
    
        public void setMessage(String message) {
          this.message = message;
        }
    
        @Override
        public String toString() {
          return "TimedTask [id=" + id + ", time=" + time + "]";
        }
    
      }
    
    }
    

    Output

    task three started
    task one started
    task two started
    task one ended
    task two ended
    task three ended
    task one took 1000 milliseconds
    task two took 2000 milliseconds
    task three took 3000 milliseconds
    task four started
    task six started
    task five started
    task four ended
    task five ended
    task six ended
    task four took 4000 milliseconds
    task five took 5000 milliseconds
    task six took 6000 milliseconds
    task seven started
    task seven ended
    task seven took 7000 milliseconds
    
     
    Last edit: James Bassett 2012-10-15
  • ajsmen

    ajsmen - 2013-09-20

    Hi,

    I want to mention, that I very much appreciate your library and I use it every time I work with CSV files.

    So, I was searching information about SuperCSV thread safety, and I got here.

    About its not being a good idea to read/write CSV from multiple threads, I will start by saying that you conveniently used Writer and Reader interfaces. I honestly don't know if there are use cases and implementations of character streams working with multiple threads, but there always can be.

    About shared buffers, I've heard its not very expensive to allocate objects in Java and even healthy for GC to use many short-lived objects instead of long lived (tenured), but I suppose it can be less performant.

    And now I will get to the point. In my project I currently do something like concurrent writes to one file from many threads - its kind of log for operation that is processed by multiple pooled threads. Considering shared buffers I created ThreadLocal with thread confined instances of CsvListWriter. But I was a bit surprised when I found out, that actual writing happens in CsvEncoder which is not thread safe (shared buffer) and its instance is contained within CsvPreference which I was sharing between my writers. I was encouraged to do so by convenient public static final field STANDARD_PREFERENCE in CsvPreference. I believed CsvPreferences was kind of read-only settings holder. I resolved this by getting preferences from factory method:

    public static CsvPreference getPreferences(CsvPreference preferences) {
        return new Builder(preferences).useEncoder(new DefaultCsvEncoder()).build();
    }
    



    I guess it wont hurt if you mention in javadoc of CsvPreference that it is not thread-safe.

    regards, mwt.

     
  • James Bassett

    James Bassett - 2013-10-02

    Hi there,

    Yes, I'm regretting putting more than settings (e.g. CsvEncoder) in CsvPreference - it seemed like a convenience at the time, and is fine for single-threaded use but breaks in multi-threaded scenarios. As I've mentioned above, the readers/writers aren't thread-safe, but as you suggest, you should be able to reuse the preferences.

    I'm thinking of moving any non-settings back to the CsvReader/Writers (just like Tokenizer is at the moment). It's going to break backwards-compatibility, but it needs fixing. I've been a bit slack with maintaining this project - I have bursts of productivity...it's just been a while since the last one :P

    James

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks