#25 Referential Cell Processor

2.1.0
closed
nobody
None
1
2013-04-24
2013-01-02
Adam Brown
No

It would be really handy to be able to use a referential validation step in a csv file. CSV fix has this on the validate there, but being able to do so from this java library instead would be really handy.

Would be the most useful if it could pull from both other fields in this file, as well as other files, and being able to define the mapping to both. Some of the files I work with have references to "Parent" fields in a record, meaning that I need to make sure the value specified there exists, in order to prevent ugly error messages from the consuming application.

Discussion

  • James Bassett

    James Bassett - 2013-01-03

    Hi Adam,

    Could you could give a quick example (with CSV) to show what you mean?
    Have you tried using cell processors to do this referential validation?

    James

     
  • Adam Brown

    Adam Brown - 2013-01-03

    Two examples from real life. First from the attached file here, is a data load of e-commerce data, so it has things that have a product and variant, or product and item relationships, such as buying a shirt online, it comes in different sizes, so you need information about both the overall shirt, as well as things specific to the size that you're buying. Attached, you'll see that there's a base "product" that has partnumber of Cords, and a few items with a parentPartNumber of Cords. Essentially, I'd like to validate that anything referenced in that parentPartNumber column already exists earlier in the file in the PartNumber Field. Second example in next post with extra csv.

     
  • Adam Brown

    Adam Brown - 2013-01-03

    Next example is a little more complicated, but is adding information to those same products/items we were working with in the last one. In this case, I'm adding attribute information, and need to make sure that PartNumbers referenced here exist in the other file, in the PartNumber field.

     
  • Adam Brown

    Adam Brown - 2013-01-03

    As for trying it with CellProcessors, haven't gotten there yet, just was doing a library evaluation yesterday, and this came closest to the validation types I'd need and still able to work through a Java program, so I can make the validation more dynamic on files like these. These files get interesting because not every column is required, nor are the above examples complete (there's other columns that can be added in to be handled).

     
  • James Bassett

    James Bassett - 2013-01-04

    Hi Adam, thanks for all the info. I've put together a working example of how you might achieve this with Super CSV and cell processors.

    I've created 2 new cell processors:

    • RecordValue (which simply records each value it encounters and adds it to a Collection)
    • IsValueIn (which checks that the value is in the Collection of allowed values). You can't use the IsIncludedIn processor that comes with Super CSV as it doesn't allow modification of the collection.

    Using them together you can do referential validation. The cell processor array looks like this (where partNumbers is just a HashSet):

    CellProcessor[] processors = new CellProcessor[] {
        new NotNull(new RecordValue(partNumbers)), // partNumber
        new Optional(), // type
        new Optional(new IsValueIn(partNumbers)), // parentPartNumber
        new Optional(), // sequence
        new Optional(), // parentGroupIdentifier
        new Optional(), // currencyCode
        new Optional(), // listPrice
        new Optional(), // price
        new Optional(), // name
        new Optional(), // shortDescription
        new Optional(), // longDescription
        new Optional(), // thumbnail
        new Optional(), // fullImage
        new Optional(), // delete
    };
    

    See the attached CatalogValidation class to see this in action. I've also demonstrated how the validation on the second file can be done. If you run it with the files you attached you'll see that it (correctly) fails validation with the following exception:

    Exception in thread "main" org.supercsv.exception.SuperCsvConstraintViolationException: 'Dress shirt-White-15' is not included in the allowed set of values
    processor=example.IsValueIn
    context={lineNo=10, rowNo=10, columnNo=1, rowSource=[Dress shirt-White-15, DressShirtColor, White, null, Defining, 1, null]}
        at example.IsValueIn.execute(IsValueIn.java:93)
        at org.supercsv.cellprocessor.ConvertNullTo.execute(ConvertNullTo.java:83)
        at org.supercsv.util.Util.executeCellProcessors(Util.java:93)
        at org.supercsv.io.CsvBeanReader.read(CsvBeanReader.java:207)
        at example.CatalogValidation.validateAttributes(CatalogValidation.java:84)
        at example.CatalogValidation.validateCatalog(CatalogValidation.java:22)
        at example.CatalogValidation.main(CatalogValidation.java:95)
    

    I'm considering adding these processors (in some form) to Super CSV, as these kind of requirements keep coming up.

    Let me know what you think!

     
    Last edit: James Bassett 2013-01-06
  • Adam Brown

    Adam Brown - 2013-01-04

    Looks good to me, should be a nice way to guarantee referential integrity on these files.

    One other question though, any easy way to processor definition based on row content in some form? I'm guessing I could just extend optional in some form, though it's not entirely clear how I'd delineate one row's iteration from another. Example from above would be ensure that every "Item" had a parentPartNumber defined, since a normal Optional(new IsValueIn) would allow free-standing item variants without products.

     
  • James Bassett

    James Bassett - 2013-01-06

    I think you're asking whether you can validate that Items have a parentPartNumber but skip that validation for Products. This is essentially cross-validation (validation involving more than 1 column).

    The only way I can think of to achieve this with cell processors is to write a custom cell processor that inspects the value in another column, and decides what to do based on the value. Every cell processor has access to the CsvContext, which contains the raw (unprocessed) values.

    So you can write a processor such as the following. It behaves like Optional if the value in a given column (where column numbers start at 1) equals the value its expecting. Otherwise, it simply delegates to the next processor in the chain.

    package example;
    
    import java.util.List;
    
    import org.supercsv.cellprocessor.Optional;
    import org.supercsv.cellprocessor.ift.CellProcessor;
    import org.supercsv.util.CsvContext;
    
    public class OptionalIfOtherColumnEquals extends Optional {
    
      private final int column;
    
      private final Object constantValue;
    
      public OptionalIfOtherColumnEquals(int column, 
                                         Object constantValue) {
        super();
        this.column = column;
        this.constantValue = constantValue;
      }
    
      public OptionalIfOtherColumnEquals(int column, 
                                         Object constantValue, 
                                         CellProcessor next) {
        super(next);
        this.column = column;
        this.constantValue = constantValue;
      }
    
      @Override
      public Object execute(Object value, CsvContext context) {
    
        // unprocessed row
        List<Object> row = context.getRowSource();
    
        // optional if other column matches value
        if (row.get(column - 1).equals(constantValue)){
          return super.execute(value, context);
        }
    
        // otherwise continue to next processor
        return next.execute(value, context);
      }
    
    }
    

    I replaced the processor definition for the parentPartNumber column with

    new OptionalIfOtherColumnEquals(2, "Product", new IsValueIn(partNumbers))
    

    and it works for Products and Items but it fails validation on the Packages at the end of the file.

    I'll leave that as an exercise for you to solve, but you can see that you can achieve cross-validation if you really have to.

    Hope this helps.

     
  • Adam Brown

    Adam Brown - 2013-01-11

    Makes sense. The fun thing I'm working with here that makes that even more complicated, is that these files, when they're eventually parsed, are keyed off of the column headers, not the csv location, so doing a restriction in the cell processor around a specific column would be less than ideal on face value. Though I guess with passing in the column # on setup of processor makes that more workable... Just interesting to work with since I have to look up which cell processors to use by header also.

    The RecordValue/IsValueIn CellProcessors are working great, btw. If you want it, I have some modifications to that and the SuppressException code to make error messages more friendly to non-technical audiences, likely generating these files in excel.

    Thanks for all the quick responses.

     
  • James Bassett

    James Bassett - 2013-01-12

    Yeah feel free to attach any modifications - it'd be good to get some ideas.

    I'm currently working on adding skip comment support - if you have any ideas there just let me know.

    Bam! Another quick response ;)

     
  • Adam Brown

    Adam Brown - 2013-01-12

    Here's the tweaks. Minor adjustments to IsValueIn, NotNull, and IsIncludedIn to be a bit more verbose with their specific validation messages. Added a new exception type, thrown by SuppressWarnings, the "PrettySuperCsvCellProcessorException", which has a bit more of a non-technical friendly output format (no class names, accepts a fieldname concept). SuppressWarnings itself got a bit smarter. To begin with, it rewrites the CsvContext so it can accurately display error columns - those were broken in the version you posted to StackOverflow. I also made it so that it knows the fieldName it's working with, so it accepts that, and passes it on to the thrown exception. I invoked it a bit more dynamically, using a HashMap as my master store for the "other" validations I'm doing on fields, then do just a "theProcessors.add( new SuppressException( field, fieldMapping.get(field) ) );" to add each resulting CellProcessor wrapped in a new SuppressException as appropriate.

     
  • James Bassett

    James Bassett - 2013-03-02

    I've added the 2 processors (now called Collector and IsElementOf) for the upcoming release.
    [r278]

     

    Related

    Commit: [r278]

  • James Bassett

    James Bassett - 2013-03-02
    • status: open --> pending
    • milestone: Outstanding --> 2.1.0
     
  • James Bassett

    James Bassett - 2013-04-24
    • Status: pending --> closed
     

Log in to post a comment.