opencsv / Bugs / #265 Opencsv may hang when reading from zip file

Scott Conway - 2025-12-24

assigned_to: Scott Conway
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2026-01-02

Hello Jo.

This is a most interesting issue. Like your own comment stated I was able to port the tests directly into opencsv which I always run in Java 8 by replacing the var in testOKnoStream with List<? extends ZipEntry> and it runs without issue.

Which tells me this is not an opencsv issue per se but a change in one of the later versions of java after java 8 that is not backwards compatible.

I will try and play with it this weekend some to open up my debugging settings in IntelliJ to be able to step into Lambda's and streams and see if I can pin down where it is freezing and see if it is something I can fix without breaking backwards compatibility. Failing that see if it is an actual noted defect in Java itself in which case we can either report it or if it is already reported wait for a fix.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2026-01-02

For giggles I did recompile and ran the code in both java 11 and java 21 (other LTS versions of java) and both had the same issue. But it does tell me the issue started with java version > 8 and <= 11 instead of <= 17.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2026-01-02

Jo - I am going to throw you a curve ball that for me lowers the priority of this. I just downloaded Java 25, which is also noted as an LTS version and it works. So:

Java 8 - test passes
Java 11 - test fails
Java 17 - test fails
Java 21- test fails
Java 25 - test passes

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jo Navy - 2026-01-07

No, it can hang with Java 25 too: just create a bigger CSV. Try to duplicate the lines of my CSV (excluding the first 5 headerlines) to get a CSV of 7000+ lines and zip it again.

This means that tests might pass but the application could hang in production when it gets a CSV bigger that the one used in tests. A very nasty scenario.

I suspect a deadlock: I have noted that method ZipFile.stream() is internally synchronized in recent Java versions (not in Java 8) and that you are using threads, even a custom thread pool. Is your IntolerantThreadPoolExecutor really necessary/useful? Does it visibly improve performance? I've not examined your code in detail. Has the pool a fixed size? I wonder what could happen in an enterprise application with many concurrent users (i.e. many other threads, not only your ones).

Last edit: Jo Navy 2026-01-07

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2026-01-09

I am curious - you got it to hang in Java 25 by making the file even larger - were you able to do the same with Java 8? because to me that has been the question all along, that I have not had time to deep dive into, what changed after java 8 that makes it prone to locking up? Well odds are you answered that yourself with your observation that ZipFile is not synchronized in java 8 but is afterwards. Which means it is not something that should be used in a multi threaded scenario without a LOT more care taken otherwise we do get the deadlocks you describe.

Now your last couple of lines have a lot of questions to unpack so I will take them one at a time, but not in order.

The first question I will tackle is the if the IntolerantThreadPoolExecutor has a fixed pool size. Yes - it is fixed at the number of processors detected.

IntolerantThreadPoolExecutor(boolean orderedResults, Locale errorLocale) { super(Runtime.getRuntime().availableProcessors(), Runtime.getRuntime().availableProcessors(), Long.MAX_VALUE, TimeUnit.NANOSECONDS, new LinkedBlockingQueue<>()); this.orderedResults = orderedResults; this.errorLocale = ObjectUtils.defaultIfNull(errorLocale, Locale.getDefault()); }

Though I did compile a special version when testing this bug where I made the core size two and then the core and max size two and still got the deadlock.

The next question is if the IntolerantThreadPoolExecutor give visible performance improvements. The answer to that is yes.... originally. I had pulled up our documentation page at https://opencsv.sourceforge.net/#upgrading_from_3_x_to_4_x and came across this statement.

We have rewritten the bean code to be multi-threaded so that reading from an input directly into beans is significantly faster. Performance benefits depend largely on your data and hardware, but our non-rigorous tests indicate that reading now takes a third of the time it used to. We have rewritten the bean code to be multi-threaded so that writing from a list of beans is significantly faster. Performance benefits depend largely on your data and hardware, but our non-rigorous tests indicate that writing now takes half of the time it used to.

Now the thing is I don't think this has really been tested since. I have a test harness I wrote years ago to performance test opencsv but it was to find hotspots in the reader/writer/parser classes with a profiler not to actually speed test. I may have to revisit that and create a test that does the multithreaded read/write and compare that to the non threaded reader/writer to see what the current differences are.

The last question is the biggie: Is the IntolerantThreadPoolExecutor necessary/useful - personally no. I personally have not created gigabyte csv files, or tried to process gigabyte database tables via opencsv. But I can assure you people have and I cannot count the number of people who wrote to us asking us to support multithreading because their processes were sooooooo slow. Going on a slight tangent with the gigabyte database that was an actual ticket about someone complaining about the performance of opencsv and how it was causing thrashing in garbage collection even though they allocated 24 gigabtytes to the VM and it turned out that they were reading an entire database into memory via opencsv to process it and a single record was 50+ megabytes (no idea what the database was but my guess it was an image database or scanned books at that size) but yeah if you take 25 gigabytes of database and drop it into 24 gigabytes of memory you are going to run out of memory regardless of what library you are using to build your objects out of the data.

Okay back on track short form was yes it was highly requested at one time and it is something that care must be taken if used. Short term I would say how hard would it be for you to unzip the file programtically, use a non-sychronized file input stream at that point then delete the uncomressed file. You will lose any performance gains I am sure but that will solve the issue. Long term this has got my curiousity up but I do not know when I can look at it (work and family commitments) but for me the first step is getting a test that will fail in Java 8 and/or getting my test harness updated with a test that has multithreading, for which I will most likely shamelessly copy some of your code, and then modify the IntolerantThreadPoolExecutor code using the Condition an dReEntrantLock classes such that only a single thread can read the file at a given time to read an entire line to process and once the line/record has been read the next thread can read - basically forcing the reading of the data to be single threaded but allowing the processing of the records to be multi threaded. And then see what effect that has on performance and if that fixes your issue.

P.S. - Darn it I thought I had found a possible solution which was to modify your CSVProcessor class to let opencsv handle the stream but it failed as well.

public static void handleSpecificCovariance(InputStream inputStream) throws IOException { try (CSVReader reader = CsvHelper.buildCsvReader(inputStream, false, '|', 5)) { List<SpecificCovariance> specificCovariances = new CsvToBeanBuilder<SpecificCovariance>(reader).withType(SpecificCovariance.class).withIgnoreEmptyLine(true).build().parse(); System.out.println("Read " + specificCovariances.size() + " objects"); } } public static void streamHandleSpecificCovariance(InputStream inputStream) throws IOException { try (CSVReader reader = CsvHelper.buildCsvReader(inputStream, false, '|', 5)) { List<SpecificCovariance> specificCovariances = new CsvToBeanBuilder<SpecificCovariance>(reader).withType(SpecificCovariance.class).withIgnoreEmptyLine(true).build().stream().collect(Collectors.toList()); System.out.println("Read " + specificCovariances.size() + " objects"); } }
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jo Navy - 2026-01-12

With Java 8 the test desn't hang even with a CSV of 56k lines (2.8 MB, the real one used by my application). Since it is much bigger that the one that hangs with Java 25 I assume that it doesn't hang at all.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Conway - 2026-01-12

So I believe it does all stem from the change in the ZipFile code to make the stream synchronized. and having multiple levels of synchronized code is causing the deadlock. And I am saying multiple levels because when I ran the test in a debugger and did a process dump I saw in one of the threads three different calls back into ZipFile/ZipInputStream.

When I get a chance I will try the Condition and ReEntrantLock around the SingleLineReader to try and only allow a single thread to access the ZipFile and see what that does to performance.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Opencsv may hang when reading from zip file

Group

Searches

Help

#265 Opencsv may hang when reading from zip file

Discussion