Menu

CsvListReader returns always same ArrayList reference

njfranck
2012-10-31
2012-11-12
  • njfranck

    njfranck - 2012-10-31

    Hi there,

    It seems that the list reader is always returning the same arraylist reference.
    (except when no results are found). This can be hard to debug.

    Consider this example.

    CSV:

    1,2,3
    4,5,6
    

    Code:

    :::java
    File file = new File("numbers.csv");
    ICsvListReader listReader = 
        new CsvListReader(new FileReader(file),csvPreference);
    ArrayList<ArrayList<String>> rows = 
        new ArrayList<ArrayList<String>>();
    
    ArrayList<String> list = new ArrayList<String>();
    System.out.println("iterating rows: ");
    while((list = (ArrayList<String>) listReader.read()) != null){
        rows.add(list);
        System.out.println(list);
    }
    System.out.println("after iteration");
    System.out.println(rows);
    

    this has the following output:

    iterating rows:
    [[1,2,3]]
    [[3,4,5],[3,4,5]]
    

    after iteration:

    [[],[]]
    

    Of course it can be fixed by copying the received list:

    :::java
    rows.add(new ArrayList<String>(list));
    

    Or is this intended?

     

    Last edit: James Bassett 2012-10-31
  • James Bassett

    James Bassett - 2012-10-31

    Good question! Super CSV reuses the List returned from the read() method (it's cleared before each line is read in - this means it's not creating new Lists all the time and is more memory efficient).

    I'll take a look at this tonight, but I think it'd be better for CsvListReader to return a new List each time.

    Thanks for reporting this :)

     
  • Kasper B. Graversen

    A few years back I did some tests indicating you could save some seconds when reading a million lines from a file if the array was reused. Initially you had to supply the array yourself which would be used, but that API looked odd.

    I think the api would be easier to use if a new copy was returned at each iteration, but the impact should be meassured on a modern machine first so that we dont see a huge speed decrease.

    it is the ever lasting war between making it fast and memory efficient and then the 10-line toy example on the back of a napkin to look good... :)

     
  • James Bassett

    James Bassett - 2012-11-01

    Looking at the source history, I actually 'enhanced' this code - it used to return a new List each time. I'll change it back ASAP!

     
  • njfranck

    njfranck - 2012-11-01

    clearing the internal array of an ArrayList-object also creates a
    new (interal) array. And that's where the biggest part of the data is lying,
    not in the arraylist-object itself.

    But thanks for your quick response!

     
  • James Bassett

    James Bassett - 2012-11-01

    Yes, that's a good point. The idea is that an ArrayList only resizes the backing array when it needs more room. By reusing (and clearing) the same ArrayList, it doesn't allocate new memory for each row that's read, because typically each row of CSV has the same number of columns so the array doesn't have to be resized. As Kasper said, this can improve performance for larger files.

     
  • James Bassett

    James Bassett - 2012-11-12

    Fixed and released in 2.0.0
    [bugs:#35]

     

    Related

    Bugs: #35

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.