CsvListReader returns always same ArrayList reference
A fast, programmer-friendly, free CSV library for Java
Brought to you by:
jamesbassett,
kbg
Hi there,
It seems that the list reader is always returning the same arraylist reference.
(except when no results are found). This can be hard to debug.
Consider this example.
CSV:
Code:
this has the following output:
after iteration:
Of course it can be fixed by copying the received list:
Or is this intended?
Last edit: James Bassett 2012-10-31
Good question! Super CSV reuses the List returned from the
read()
method (it's cleared before each line is read in - this means it's not creating new Lists all the time and is more memory efficient).I'll take a look at this tonight, but I think it'd be better for CsvListReader to return a new List each time.
Thanks for reporting this :)
A few years back I did some tests indicating you could save some seconds when reading a million lines from a file if the array was reused. Initially you had to supply the array yourself which would be used, but that API looked odd.
I think the api would be easier to use if a new copy was returned at each iteration, but the impact should be meassured on a modern machine first so that we dont see a huge speed decrease.
it is the ever lasting war between making it fast and memory efficient and then the 10-line toy example on the back of a napkin to look good... :)
Looking at the source history, I actually 'enhanced' this code - it used to return a new List each time. I'll change it back ASAP!
clearing the internal array of an ArrayList-object also creates a
new (interal) array. And that's where the biggest part of the data is lying,
not in the arraylist-object itself.
But thanks for your quick response!
Yes, that's a good point. The idea is that an ArrayList only resizes the backing array when it needs more room. By reusing (and clearing) the same ArrayList, it doesn't allocate new memory for each row that's read, because typically each row of CSV has the same number of columns so the array doesn't have to be resized. As Kasper said, this can improve performance for larger files.
Fixed and released in 2.0.0
[bugs:#35]
Related
Bugs:
#35