Menu

#187 Problem when parsing a CSV file with empty entries

v1.0 (example)
closed-invalid
None
5
2018-12-08
2018-11-15
No

When parsing a CSV file, whose separator is '|' and whose lines start like this

ORIGINAL|REPLACEMENT|COMMENT
strong||
strong||

an error occurs. The error can be traced back to the fact that some entries have empty values. It does not help to enclose the entries in quotes ("). Even, when an entry such as " " is used (a blank enclosed with double quotes), an error is produced during parsing.

However, parsing

ORIGINAL|REPLACEMENT|COMMENT
strong|a|
/strong|b|

works.

3 Attachments

Related

Bugs: #187

Discussion

  • Scott Conway

    Scott Conway - 2018-11-16
    • assigned_to: Scott Conway
     
  • Andrew Rucker Jones

    • assigned_to: Scott Conway --> Andrew Rucker Jones
     
  • Andrew Rucker Jones

    This is expected behavior. You have annotated the fields for "ORIGINAL" and "REPLACEMENT" in your bean using the parameter "required = true". That means there has to be a value.

    The fact that whitespace is not considered significant baffles me a bit, but I forget all of the parser options and what they mean. Scott, care to comment on that part?

     
  • Hans-Martin Adorf

    The attribute "required" means the value must be non-null. The empty string (i.e. a string that contains no characters is non-null, and should be allowed in the CSV file and imho should be parsed properly.

     
  • Andrew Rucker Jones

    That is a matter of opinion, especially since in a text file it is essentially impossible to distinguish between a meaningless empty string and a meaningful empty string. The fact is, this has been expected behavior since the introduction of the annotations in question in version 3.8 of opencsv.

    I can offer you two suggestions:

    1. Write your own custom converter that handles empty fields the way you need them handled.
    2. Mark the fields with "required = false", then pass your beans to an implementation of BeanVerifier that differentiates between null and an empty string, discarding or throwing an exception upon encountering the former.
     
  • Scott Conway

    Scott Conway - 2018-11-22

    Hello Hans. To further drive home Andrew's point that definition of not-null is a matter of opinion look at the CSVReaderNullFieldIndicator because users wanted to define what is considered a null field.

    I like the BeanVerifier idea. Much better than what I was thinking (setting the required to false and then modifying the setter of the bean class to handle the null vs not null).

    As to why whitespace is not considered is because of the code in the AbstractBeanField.java class. If we really want spaces to be considered not null we need to replace the isBlank with isEmpty.

              @Override
                  public final void setFieldValue(T bean, String value, String header)
                          throws CsvDataTypeMismatchException, CsvRequiredFieldEmptyException,
                          CsvConstraintViolationException {
                      if (required && **StringUtils.isBlank(value)**) {
                          throw new CsvRequiredFieldEmptyException(
                                  bean.getClass(), field,
                                  String.format(ResourceBundle.getBundle(ICSVParser.DEFAULT_BUNDLE_NAME, errorLocale).getString("required.field.empty"),
                                          field.getName()));
                      }
    
                      assignValueToField(bean, convert(value), header);
                  }
    

    Scott :)

     

    Last edit: Scott Conway 2018-11-22
  • Hans-Martin Adorf

    Hi Scott,

    thank you very much for devoting so much time to the problem that I reported.

    Before getting back to the problem I should like to also thank you for the great work you are doing. I had used OpenCSV a while ago, and I was positively suprised to discover that one can now parse a CSV-File directly into a bean.

    Here is my use case: I have a text document in which I want to replace certain strings by others. Originally the strings to be searched for where abbreviations, which had to be replaced by their expansions. Now I am using the same mechanism for other purposes.

    I maintain that your code should be able to properly parse an entry such as " ". It represents a string with a blank, which is a valid string. The character happens to be a blank, but that is a valid printable character (as opposed to a control character). Next, I argue, that your code should parse the string "". It is an empty string, yes, but the empty string is a valid string, too. Consider my use case above: it should be possible to replace a string, say "aeiou", by the empty string "". As you know in Java the empty string is distinguished from null.

    I think the only case where how to handle a case is a matter of opinion is when the entry is ';;' where ';' is the CSV separator. I could live with your decision to translate this entry to null in the Java code.

    Unless you follow my argument above, I am unsure about how to best to resolve this "matter of opinion". Currently I see two possibilities: (1) You introduce a flag with which the user of you code can set in order to specify the desired bahaviour. (2) You make your method above overridable, so that I could make the replacement from isString to isEmpty, or whatever.

    What do you think?

    Regards
    Hans-Martin Adorf

     
  • Scott Conway

    Scott Conway - 2018-11-24

    I think we are agreeing with each other but with different words <bg>. </bg>

    I thought about replacing the isBlank with isEmpty but want to talk with Andrew about that some more to see if that was intentional or not.

    I am about to lay out a very circular arguement here so please bear with me.

    When laying out the required field the purpose of having a field marked as required was that it had to have a value. Your arguement is that the lack of a value (;;) is a value (empty string) and should be processed as such. But for your arguement to hold then there can be no null string values (because how do you define null if you have defined ;; as empty string) and breaks the required functionality for Strings.

    Which leads me to the question: Why do you want to override existing behavior when you can just set the required flag to false. Unless that still gives you null in which case the second part of Andrew's suggestion makes sense where he mentioned you create a custom converter (which was your second request to make the method overridable so you can create your own class) because none of the converters are final and you just need to use the @CsvCustomBindByPosition or @CsvCustomBindByName.

    Scott :)

     
  • Scott Conway

    Scott Conway - 2018-11-25

    Oh I ran your tests and I am not getting null at all. I am getting

    Caused by: com.opencsv.exceptions.CsvRequiredFieldEmptyException: Field 'replacement' is mandatory but no value was provided.
    at com.opencsv.bean.AbstractBeanField.setFieldValue(AbstractBeanField.java:163)
    at com.opencsv.bean.AbstractMappingStrategy.setFieldValue(AbstractMappingStrategy.java:449)
    at com.opencsv.bean.AbstractMappingStrategy.populateNewBean(AbstractMappingStrategy.java:317)
    at com.opencsv.bean.concurrent.ProcessCsvLine.processLine(ProcessCsvLine.java:134)
    at com.opencsv.bean.concurrent.ProcessCsvLine.run(ProcessCsvLine.java:87)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

    and removing the required = true caused it to work. Which makes sense for what you are asking for since you believe that spaces and empty should be spaces and empty then when would you have data that would fail a required check?

    Scott :)

     
  • Hans-Martin Adorf

    Hi Scott,

    thank you once more for devoting your time to the problem that I reported.

    I should like to restructure the discussion. For the moment let us put aside the question of whether the field in the bean is required to have a value. The salient quesion is how to you get OpenCSV to parse a CSV field that has zero or more blanks and what the result should be. This is a semantic question. The problem arises, because of the weak (sick?) definition of the CSV format.

    Let us consider a suitably (JAXB) annotated Java bean with a string attribute x. If we serialize (marshal) the bean using JAXB, we obtain an XML file, where the bean attribute is present in the list of key-value pairs, if it has a non-null value. If the value of x is undefined (value = null), the attribute is absent in the key-value list in the XML-file. When we read in (unmarshal) an XML-file, where the property has an empty string in the XML-file, the attribute in the Java beain will receive an empty string value. A fortiori the serialization deserialization works for strings that consist of one or more blanks. Summarizing, strings with zero or more blanks are allowed in the JAXB framework. They are properly serialized an deserialized. Both, null and "" are allowed, and are distinguished in Java and in XML as well.

    Unfortunately, the matter is not so easy when the value of a Java string variable is to be stored in a CSV file. A string with one or more blanks should be read into a Java String variable, when it is enclosed in quotes, such as " ". This is semantics. I need to be able to write the value of a Java String variable to the CSV file. And when the value happens to be zero or more blanks, so be it. If the CSV file definition were such that one would always have to enclose the strings with quotes, unless the value is null, then our whole discussion would evaporate. Unfortunately this is not the case.

    I think if the user writes a quoted string with zero or more blanks into the CSV file, his intention is that the value of the string is blanks. And therefore OpenCSV should offer a mechanism for serialization and deserialization of strings that could have zero or more blanks as their value.

    The salient question is: should one offer a possibility to represent null in the CSV file. In view of the unclear (weird?) definition of the CSV format I tend to exclude null. However, that is a matter of opinion. If one insists of representing null in the CSV file, then I think the only possibility is zero or more blanks between the seperators, but no quotes. Or a flag, as mentioned earlier on.

    I hope I have expressed my opinion clearly.

    Cheers
    Hans-Martin

     
  • Scott Conway

    Scott Conway - 2018-11-29

    I guess what has been confusing me about your emails is that we already have that with the CSVReaderNullFieldIndicator. If in the creation of your CsvToBean you add .withFieldAsNull

    replacementBeans = new CsvToBeanBuilder(new FileReader(replacementsCsvFile))
    .withType(ReplacementBean.class)
    .withSeparator(separator)
    .withQuoteChar('"')
    .withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS)
    .withEscapeChar('\')
    .build()
    .parse();

    Then all empty separators (,,) would become a null field. You can see examples of this in the CSVParserTest.

    @Test
    public void featureRequest60ByDefaultEmptyFieldsAreBlank() throws IOException {
        StringBuilder sb = new StringBuilder(ICSVParser.INITIAL_READ_SIZE);
    
        sb.append(",,,\"\",");
    
        CSVParserBuilder builder = new CSVParserBuilder();
        ICSVParser parser = builder.build();
    
        String item[] = parser.parseLine(sb.toString());
    
        assertEquals(5, item.length);
        assertEquals("", item[0]);
        assertEquals("", item[1]);
        assertEquals("", item[2]);
        assertEquals("", item[3]);
        assertEquals("", item[4]);
    }
    
    @Test
    public void featureRequest60TreatEmptyFieldsAsNull() throws IOException {
    
        StringBuilder sb = new StringBuilder(ICSVParser.INITIAL_READ_SIZE);
    
        sb.append(", ,,\"\",");
    
        CSVParserBuilder builder = new CSVParserBuilder();
        ICSVParser parser = builder.withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS).build();
    
        String item[] = parser.parseLine(sb.toString());
    
        assertEquals(5, item.length);
        assertNull(item[0]);
        assertEquals(" ", item[1]);
        assertNull(item[2]);
        assertEquals("", item[3]);
        assertNull(item[4]);
    
    }
    
    @Test
    public void featureRequest60TreatEmptyDelimitedFieldsAsNull() throws IOException {
        StringBuilder sb = new StringBuilder(ICSVParser.INITIAL_READ_SIZE);
    
        sb.append(",\" \",,\"\",");
    
        CSVParserBuilder builder = new CSVParserBuilder();
        ICSVParser parser = builder.withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_QUOTES).build();
    
        String item[] = parser.parseLine(sb.toString());
    
        assertEquals(5, item.length);
        assertEquals("", item[0]);
        assertEquals(" ", item[1]);
        assertEquals("", item[2]);
        assertNull(item[3]);
        assertEquals("", item[4]);
    }
    
    @Test
    public void featureRequest60TreatEmptyFieldsDelimitedOrNotAsNull() throws IOException {
    
        StringBuilder sb = new StringBuilder(ICSVParser.INITIAL_READ_SIZE);
    
        sb.append(", ,,\"\",");
    
        CSVParserBuilder builder = new CSVParserBuilder();
        ICSVParser parser = builder.withFieldAsNull(CSVReaderNullFieldIndicator.BOTH).build();
    
        String item[] = parser.parseLine(sb.toString());
    
        assertEquals(5, item.length);
        assertNull(item[0]);
        assertEquals(" ", item[1]);
        assertNull(item[2]);
        assertNull(item[3]);
        assertNull(item[4]);
    
    }
    
     
  • Hans-Martin Adorf

    Hi Scott,

    thanks for your message. I was unaware of this option.

    However, I am not really interested in null values. I included null values in the discussion only because I wanted to discuss the general case.

    What I am interested in is being able to put strings with zero or more blanks into the CSV file, and to find in the corresponding bean the string of blanks in the corresponding Java variable for that column. All I need is a recipe how to get that done using OpenCSV.

    Your examples show a variety of possibilities to parse a String into an Array of values. Actually you provide more options than what we have been discussing. Chapeau.

    However, I do not want to parse strings into arrays of strings. Instead, I want to parse a CSV file into a list of beans, using your wonderful new mechanism. And herein lies the problem.

    I modified my textRepair.csv file (which I attach), and inserted quotes '"'. I wrote a JUnit test file (which I also attach). The test shows that all your examples work. However parsing the textRepair.csv file always throws an exception when parsing line 13, which is the one that deals with blanks. I used the three .withFieldAsNull options that you provided in your examples. I also omitted that option, but parsing always fails.

    If you can provide a recipe for OpenCSV I'll test it, and if it works we can close this issue.

    Kind regards
    Hans-Martin

     
  • Scott Conway

    Scott Conway - 2018-12-06

    I don't understand the problem because I can put spaces in without issue - just look at the test above. Your issue was that you had the field set to required - which by our definition means you need a non blank, non empty field. That is not a bug. When I took out the required I got spaces.

    Let's try this. Send me another sample of your code with a FAILING unit test that is failing because the system is not behaving the way you think it is. That will give me a much better idea what you are talking about.

    Scott :)

     
  • Hans-Martin Adorf

    Hi Scott,

    I finally know, why I could not make myself understood so far.

    Lets first look into the JAXB-world. My mental model is embodied in the attached JUnit test and the XML-file that is being read (unmarhalled) in the test. As you know, in the JAXB-context there is no distinction between blanks and other characters. A string can have zero or more characters (or blanks). However, there is a distinction between an empty string and null. In the XML-file the fact that an attribute has no value (null) is expressed by the absence of the attribute. When unmarshalling the file, the corresponding Java attribute (variable) is null. So there is a one-to-one correspondence between the XML-File and the JAXB-annotated bean. No flags are required. Zero or more blanks are automatically distinguished from null. No parameters are needed.

    Lets now look into the CSV-world. Unfortunately, the format of CSV-files is only loosely specified, as you know. E.g. separators need to be defined. In particular, there is the question, what ';"";" "; " "' means. Here ';' is the separator, and I use ' to denote the start and end of the string.

    Here comes the resolution of our misunderstanding:

    I had been thinking in the JAXB-context, where "B", " ", ' ", "" are all valid strings. So I had thought that the CSV-file provided valid string values, and did not understand why an exception had been thrown. In contrast, you have defined the annotation @CsvBindByName(required = true) in such a way that strings with zero or more blanks are not valid entries in a CSV-file. You are the author of your code. Thus you can make decisions such as this one. I had (wrongly) believed that strings with zero or more blanks would be valid entries, and in order to make my intention explicit, I had enclosed the blanks in quotes: '"", " "; " "'. It was clear to me all the time that in the CSV-world there is the difficulty to distinguish the empty string from null.

    In conclusion, instead of using the annotation '@CsvBindByName(required = true)' I now use the annotation '@CsvBindByName' (without the qualifier) throughout the ReplacementBean and I am using .withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS) This combination works for me.

    I thank you very much for bearing with me, and helping me to resolve the issue. Fortunately, I had not claimed to have found a bug in OpenCsv. I had just reported a problem, which I had had with the OpenCsv code. The problem was a lack of understanding.

    Keep up the good work.

    Merry Christmas
    Hans-Martin

     
    • Scott Conway

      Scott Conway - 2018-12-09

      I am glad you have found the solution to the problem.

      Scott :)

      On Thu, Dec 6, 2018 at 11:17 AM Hans-Martin Adorf dradorf@users.sourceforge.net wrote:

      Hi Scott,

      I finally know, why I could not make myself understood so far.

      Lets first look into the JAXB-world. My mental model is embodied in the
      attached JUnit test and the XML-file that is being read (unmarhalled) in
      the test. As you know, in the JAXB-context there is no distinction between
      blanks and other characters. A string can have zero or more characters (or
      blanks). However, there is a distinction between an empty string and null.
      In the XML-file the fact that an attribute has no value (null) is expressed
      by the absence of the attribute. When unmarshalling the file, the
      corresponding Java attribute (variable) is null. So there is a one-to-one
      correspondence between the XML-File and the JAXB-annotated bean. No flags
      are required. Zero or more blanks are automatically distinguished from
      null. No parameters are needed.

      Lets now look into the CSV-world. Unfortunately, the format of CSV-files
      is only loosely specified, as you know. E.g. separators need to be defined.
      In particular, there is the question, what ';"";" "; " "' means. Here ';'
      is the separator, and I use ' to denote the start and end of the string.

      Here comes the resolution of our misunderstanding:

      I had been thinking in the JAXB-context, where "B", " ", ' ", "" are all
      valid strings. So I had thought that the CSV-file provided valid string
      values, and did not understand why an exception had been thrown. In
      contrast, you have defined the annotation @CsvBindByName(required = true)
      in such a way that strings with zero or more blanks are not valid entries
      in a CSV-file. You are the author of your code. Thus you can make decisions
      such as this one. I had (wrongly) believed that strings with zero or more
      blanks would be valid entries, and in order to make my intention explicit,
      I had enclosed the blanks in quotes: '"", " "; " "'. It was clear to me all
      the time that in the CSV-world there is the difficulty to distinguish the
      empty string from null.

      In conclusion, instead of using the annotation '@CsvBindByName(required =
      true)' I now use the annotation '@CsvBindByName' (without the qualifier)
      throughout the ReplacementBean and I am using
      .withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS) This
      combination works for me.

      I thank you very much for bearing with me, and helping me to resolve the
      issue. Fortunately, I had not claimed to have found a bug in OpenCsv. I had
      just reported a problem, which I had had with the OpenCsv code. The problem
      was a lack of understanding.

      Keep up the good work.

      Merry Christmas
      Hans-Martin


      Status: open
      Group: v1.0 (example)
      Created: Thu Nov 15, 2018 04:44 PM UTC by Hans-Martin Adorf
      Last Updated: Thu Dec 06, 2018 03:53 AM UTC
      Owner: Andrew Rucker Jones
      Attachments:

      When parsing a CSV file, whose separator is '|' and whose lines start like
      this

      ORIGINAL|REPLACEMENT|COMMENT
      strong||
      strong||

      an error occurs. The error can be traced back to the fact that some
      entries have empty values. It does not help to enclose the entries in
      quotes ("). Even, when an entry such as " " is used (a blank enclosed with
      double quotes), an error is produced during parsing.

      However, parsing

      ORIGINAL|REPLACEMENT|COMMENT
      strong|a|
      /strong|b|

      works.

      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/opencsv/bugs/187/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Scott Conway
      scott.conway@gmail.com
      http://www.conwayfamily.name

       

      Related

      Bugs: #187

  • Scott Conway

    Scott Conway - 2018-12-08
    • assigned_to: Andrew Rucker Jones --> Scott Conway
     
  • Scott Conway

    Scott Conway - 2018-12-08
    • status: open --> closed-invalid
     

Log in to post a comment.

MongoDB Logo MongoDB