Menu

Home

Scott Conway Glen Smith

A Simple CSV Parser for Java under a commercial-friendly Apache 2.0 license and performance tested with JProfiler


Project Members:

Main pages

[What's new] - change log and release news combined.

[Cool tools] - tools/application/frameworks that use or extend opencsv.

[FAQ] - Frequently Asked Questions

[Submissions] - Want to report a bug/feature request or better yet submit an patch with a new feature or defect fix! Please read me first for some advice to make things easier for both of us.

openCSV api - Javadocs for the project.

Project Reports - Generated my maven site. Contains the JaCoCo, findbugs, checkstyles, as well as the javadocs for the project.

Project Information - project information generated by maven site plugin.


Related

Wiki: Cool tools
Wiki: FAQ
Wiki: Submissions
Wiki: What's new

Discussion

  • Ankit G

    Ankit G - 2013-11-29

    Hi
    I have few doubts regarding OpenCSV:

    • Does openCSV read file line-by-line or the whole at once into the memory ?
    • Does OpenCSV provide any in-built formatting validations. If yes, are they customizable ?
    • Is multi-threading in any form supported by OpenCSV ? For my code I'm actually aiming for thread-pooling. Example: Say my CSV consists of 100 records or lines (assuming each record per line). Now I want to create thread T1 and make it read records 1-10, create T2 and make it read records from 11-20, T3 21-30 and so on....
      So, does SuperCSV has in-built support for this ?? or if I have to do it manually, then how ?
     
  • Scott Conway

    Scott Conway - 2014-09-11

    Sorry its been over a year but life has been busy. As for your questions.

    1. line-by-line or all at once: your choice - if you have a memory constraint you can create a CSVParser and have it convert a file line by line into an array of string or the CSVBuilder readNext. Or if you have memory you can use the CSVToBean to get a list of objects with the items in your file mapped into it.

    2. built in formatters - yes but I have not used anything but the default. If you use the HeaderColumnNameMappingStrategy it uses java.bean.PropertyDescriptor to translate data. You should be able to define your own mapping strategy and inject in your own property descriptors to allow formatting of data. But honestly what I would consider is to use the HeaderColumnNameMappingStrategy and CSVToBean to create a temporary DTO (Data Transfer Object) whose getters return the data in the format you desire.

    3. Thread safe - no. The fact we use ArrayList tells me we are thread unsafe. Unless you have MASSIVE files though I would not worry. At work we were able to parse a 20k line file with 80 columns in less than a minute. Though that was a rather beefy box and I never tried that large of a file on my 2007 MBP. You will have to do the threading yourself. Sorry but I do not know if SuperCSV handles multi threading.

     
  • andres reynoso

    andres reynoso - 2020-04-26

    Do you have any examples of the use of the bean verifier? I could not find them on the java docs.

     
    • Andrew Rucker Jones

      No, I admit we don't, but it's a simple interface: one method that returns true if the bean is okay, false if it should be silently filtered, and throws CsvConstraintViolationException if the bean is fundamentally inconsistent. What other information did you need? I'm happy to help if I can.

       
      • andres reynoso

        andres reynoso - 2020-04-26

        Yeah sorry for bothering you, maybe i am misunderstanding the concept of the beanverifier, but i though that you had to pass the bean produced that is going to be produced by the parser on the verifyBean method, but was only getting nulls on the rowDto parameter when evaluating the expression.

        public class CsvVerifier implements BeanVerifier<MyRowDTO> {
            private MyRowDto rowDto;
        
             @Override
             public boolean verifyBean(MyRowDTO rowDto) throws CsvConstraintException {
             this.rowDto;
             return fieldValidOrThrow(MyRowDTO::getField, "field");
             }
        
             private boolean fieldValidOrThrow(Function<MyRowDTO, String> f, String node) throws CsvConstraintException {
              ...
             }
        }
        
         

        Last edit: andres reynoso 2020-04-26
        • Andrew Rucker Jones

          I'm not sure what you mean. You pass a BeanVerifier to CsvToBeanBuilder.withVerifier() and opencsv uses it internally. You personally never pass anything to a BeanVerifier in your code.

          Your code sample has a few problems. First of all, the first line of your verifyBean() method is a no-op. I think you mean to assign rowDto to this.rowDto. That would lead to the next problem, though. As stated in the Javadoc, BeanVerifiers must be thread-safe, and that is not thread-safe behavior. I don't know what you're doing in fieldValidOrThrow(), but I would recommend moving that code directly into verifyBean(), where it's supposed to be, and definitely remove the private instance variable rowDto.

           
          • andres reynoso

            andres reynoso - 2020-04-26

            Ok, going to try that, thank you very much

             
  • Norbert Becker

    Norbert Becker - 2020-11-18

    Hi,
    we have been using OpenCSV Version 3.8 for a longer time.
    Now we migrated our code to Version 5.3 and have (by now :-) two issues:

    First issue:
    We have an input file like this:

    "A-column","B-column"
    1
    2
    

    The bean is mapped to "A-column" and "B-column" by HeaderColumnNameTranslateMappingStrategy. The B-column value is optional.
    The effect in 5.3 is that NO line is parsed: We got an emty list - no error is thrown
    Changing it to ...

    "A-column","B-column"
    1,
    2,
    

    ... makes it work. But the problem is that users of our applications use such files.
    Questions:
    Is there a may to make this "lazy" separator handling work again?
    Is it a bug that no error is raised for those input lines?

    Second issue:
    We do not use CsvBindByName annotations. Now we found out that OpenCSV 5.3 only uses the fields for setting the value. The former Version 3.8 did also use setter. These is possible the case because the old version used the java.beans.PropertyEditor framework and the new version doesn't use it.
    Is there a way to use the setter too?

    Thanks for a swift answer!

     

    Last edit: Norbert Becker 2020-11-18
    • Andrew Rucker Jones

      Would have prefered to see a ticket for this, but whatever. :)

      Lazy separators: I'm surprised this ever worked. It looks like maybe it was bug #155 that fixed this. I doubt we would be willing to support this, since that kind of input is simply malformed.

      Using setters: I'm surprised to hear you say this. The code for assigning values to fields is consistent in opencsv, no matter what binding method is used. (That was the reason we got rid of Introspection—half the code used it, half didn't.) opencsv always uses a setter if it's available. That said, the setter is not a setter in the true bean/Introspection sense. The bean definition allows one to choose any name one wants for setters as long as a property file exists to define the mapping from setter to field. (At least that's what I remember reading.) opencsv does nothing more than honor the convention that the setter for "myProperty" is called "setMyProperty".

       
    • Scott Conway

      Scott Conway - 2020-11-18

      Hello

      Sorry but Andrew is actually being nice. You can blame me for this and I will gladly take it.

      It was not just #155 but we had several bugs in a short period of time where malformed (missing or extra column) data was causing parse errors. Some benign like what you actually want (null or empty field) but some were very malicious as a missing column caused all the data to be shifted and opencsv was trying to figure out how to convert "some string" into an Integer or Date.

      And the last part was the problem. We can ASSume that the data was not malformed and that missing columns were the last columns intentionally removed but that causes issues when it IS malformed and could potentially make it harder to debug - especially when everything is a String.
      So when dealing with this we went back to the RFC4180 specification on CSV files:

      *4. Within the header and each record, there may be one or more
      fields, separated by commas. Each line should contain the same
      number of fields throughout the file. Spaces are considered part
      of a field and should not be ignored. The last field in the
      record must not be followed by a comma. For example:

      aaa,bbb,ccc*

      This is in section 2 of the document: https://tools.ietf.org/html/rfc4180

      So what we do parse the first record to get the "required" number of fields and any record after that must have that number or it is in error.

      Honestly I am surprised no one has taken us to task on the second part of number four (last field must not be followed by a comma). But if no one is complaining then I am okay with our current behavior (treat it as null or empty string depending on settings).

      Sorry, I know that was not what you wanted to hear. But there was alot of history behind making that decision and wanted you to know it was not made lightly because we really do take backwards compatibility seriously and only break it when we feel it is absolutely necessary. And it this case it was as it we needed to follow the standards to stop all the bugs we were getting from malformed csv files.

       
      • Norbert Becker

        Norbert Becker - 2020-11-19

        Hi Andrew & Scott,
        thanks for the swift reply!

        Regarding syntax handling aligned to RFC4180: Ok - that's understandable and acceptable :-). The main problem had been that I got no error and the lines where just skipped. I checked our error handling: We use CsvToBean.setThrowExceptions(false) and retrieve the list of exceptions by CsvToBean.getCapturedExceptions() after parsing has finished. This list is still empty?! I changed the code and now use CsvToBean.setErrorHandler() to collect the exceptions. Now it works and I get an exception for each corrupted line ;-).
        So maybe a bug regarding the compat code for collecting exceptions? The JavaDoc recomments the use of an exception handler anyway and I leave it up to you to fix it or not. We no longer need the flag/getter...

        Regarding usage of setter: You are right - setters are used if they exist. But the finding out whether a setter exist is done by the field name: myProperty -> setMyProperty().
        Our "mistake" has been that we use different names which is indeed not standard for Java Beans: property/setMyProperty(). In Version 3.8 this works if you use myProperty as column name. Only to let you know ... we adjusted this codes.

        So to wrap it up: The problems where solved.
        Thanks for your support!

         
        • Andrew Rucker Jones

          I'm very bothered by your claim that exceptions are not being collected and reported with setThrowExceptions(false) and getCapturedExceptions(). Could you possibly open a ticket and provide us with a verifyable test case?

           
        • Scott Conway

          Scott Conway - 2020-11-20

          Yes please send us a test case! In 5.4 we refactored the CsvToBeanBuilder withThrowsException method to essentially call the setThrowException on the CsvToBean in the build method. We do have tests on that and they are passing. So I am curious as to what we missed.

           
          • Norbert Becker

            Norbert Becker - 2020-11-23

            Ok - I found the reason for the behavior and created ticket https://sourceforge.net/p/opencsv/feature-requests/142/

            You may think that it makes no sense to do such a coding. But that's how it looked liked before and worked with older versions of OpenCSV.
            So for me not really a bug, but some changes in implementation that may force users to adapt there code.

            Thanks again for your support!
            We are now happy (again) with OpenCSV in version 5.3 ;-).

             

Log in to post a comment.