Super CSV / Feature Requests / #25 Referential Cell Processor

James Bassett - 2013-01-03

Hi Adam,

Could you could give a quick example (with CSV) to show what you mean?
Have you tried using cell processors to do this referential validation?

James

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Adam Brown - 2013-01-03

Two examples from real life. First from the attached file here, is a data load of e-commerce data, so it has things that have a product and variant, or product and item relationships, such as buying a shirt online, it comes in different sizes, so you need information about both the overall shirt, as well as things specific to the size that you're buying. Attached, you'll see that there's a base "product" that has partnumber of Cords, and a few items with a parentPartNumber of Cords. Essentially, I'd like to validate that anything referenced in that parentPartNumber column already exists earlier in the file in the PartNumber Field. Second example in next post with extra csv.

CatalogEntry.csv

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Adam Brown - 2013-01-03

Next example is a little more complicated, but is adding information to those same products/items we were working with in the last one. In this case, I'm adding attribute information, and need to make sure that PartNumbers referenced here exist in the other file, in the PartNumber field.

CatalogEntryAttributeDictionaryAttributeRelationship.csv

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Adam Brown - 2013-01-03

As for trying it with CellProcessors, haven't gotten there yet, just was doing a library evaluation yesterday, and this came closest to the validation types I'd need and still able to work through a Java program, so I can make the validation more dynamic on files like these. These files get interesting because not every column is required, nor are the above examples complete (there's other columns that can be added in to be handled).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hi Adam, thanks for all the info. I've put together a working example of how you might achieve this with Super CSV and cell processors.

I've created 2 new cell processors:

RecordValue (which simply records each value it encounters and adds it to a Collection)
IsValueIn (which checks that the value is in the Collection of allowed values). You can't use the IsIncludedIn processor that comes with Super CSV as it doesn't allow modification of the collection.

Using them together you can do referential validation. The cell processor array looks like this (where partNumbers is just a HashSet):

CellProcessor[] processors = new CellProcessor[] {
    new NotNull(new RecordValue(partNumbers)), // partNumber
    new Optional(), // type
    new Optional(new IsValueIn(partNumbers)), // parentPartNumber
    new Optional(), // sequence
    new Optional(), // parentGroupIdentifier
    new Optional(), // currencyCode
    new Optional(), // listPrice
    new Optional(), // price
    new Optional(), // name
    new Optional(), // shortDescription
    new Optional(), // longDescription
    new Optional(), // thumbnail
    new Optional(), // fullImage
    new Optional(), // delete
};

See the attached CatalogValidation class to see this in action. I've also demonstrated how the validation on the second file can be done. If you run it with the files you attached you'll see that it (correctly) fails validation with the following exception:

Exception in thread "main" org.supercsv.exception.SuperCsvConstraintViolationException: 'Dress shirt-White-15' is not included in the allowed set of values
processor=example.IsValueIn
context={lineNo=10, rowNo=10, columnNo=1, rowSource=[Dress shirt-White-15, DressShirtColor, White, null, Defining, 1, null]}
    at example.IsValueIn.execute(IsValueIn.java:93)
    at org.supercsv.cellprocessor.ConvertNullTo.execute(ConvertNullTo.java:83)
    at org.supercsv.util.Util.executeCellProcessors(Util.java:93)
    at org.supercsv.io.CsvBeanReader.read(CsvBeanReader.java:207)
    at example.CatalogValidation.validateAttributes(CatalogValidation.java:84)
    at example.CatalogValidation.validateCatalog(CatalogValidation.java:22)
    at example.CatalogValidation.main(CatalogValidation.java:95)

I'm considering adding these processors (in some form) to Super CSV, as these kind of requirements keep coming up.

Let me know what you think!

Last edit: James Bassett 2013-01-06

example.zip

Adam Brown - 2013-01-04

Looks good to me, should be a nice way to guarantee referential integrity on these files.

One other question though, any easy way to processor definition based on row content in some form? I'm guessing I could just extend optional in some form, though it's not entirely clear how I'd delineate one row's iteration from another. Example from above would be ensure that every "Item" had a parentPartNumber defined, since a normal Optional(new IsValueIn) would allow free-standing item variants without products.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

I think you're asking whether you can validate that Items have a parentPartNumber but skip that validation for Products. This is essentially cross-validation (validation involving more than 1 column).

The only way I can think of to achieve this with cell processors is to write a custom cell processor that inspects the value in another column, and decides what to do based on the value. Every cell processor has access to the CsvContext, which contains the raw (unprocessed) values.

So you can write a processor such as the following. It behaves like Optional if the value in a given column (where column numbers start at 1) equals the value its expecting. Otherwise, it simply delegates to the next processor in the chain.

package example;

import java.util.List;

import org.supercsv.cellprocessor.Optional;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.util.CsvContext;

public class OptionalIfOtherColumnEquals extends Optional {

  private final int column;

  private final Object constantValue;

  public OptionalIfOtherColumnEquals(int column, 
                                     Object constantValue) {
    super();
    this.column = column;
    this.constantValue = constantValue;
  }

  public OptionalIfOtherColumnEquals(int column, 
                                     Object constantValue, 
                                     CellProcessor next) {
    super(next);
    this.column = column;
    this.constantValue = constantValue;
  }

  @Override
  public Object execute(Object value, CsvContext context) {

    // unprocessed row
    List<Object> row = context.getRowSource();

    // optional if other column matches value
    if (row.get(column - 1).equals(constantValue)){
      return super.execute(value, context);
    }

    // otherwise continue to next processor
    return next.execute(value, context);
  }

}

I replaced the processor definition for the parentPartNumber column with

new OptionalIfOtherColumnEquals(2, "Product", new IsValueIn(partNumbers))

and it works for Products and Items but it fails validation on the Packages at the end of the file.

I'll leave that as an exercise for you to solve, but you can see that you can achieve cross-validation if you really have to.

Hope this helps.

Adam Brown - 2013-01-11

Makes sense. The fun thing I'm working with here that makes that even more complicated, is that these files, when they're eventually parsed, are keyed off of the column headers, not the csv location, so doing a restriction in the cell processor around a specific column would be less than ideal on face value. Though I guess with passing in the column # on setup of processor makes that more workable... Just interesting to work with since I have to look up which cell processors to use by header also.

The RecordValue/IsValueIn CellProcessors are working great, btw. If you want it, I have some modifications to that and the SuppressException code to make error messages more friendly to non-technical audiences, likely generating these files in excel.

Thanks for all the quick responses.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

James Bassett - 2013-01-12

Yeah feel free to attach any modifications - it'd be good to get some ideas.

I'm currently working on adding skip comment support - if you have any ideas there just let me know.

Bam! Another quick response ;)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Adam Brown - 2013-01-12

Here's the tweaks. Minor adjustments to IsValueIn, NotNull, and IsIncludedIn to be a bit more verbose with their specific validation messages. Added a new exception type, thrown by SuppressWarnings, the "PrettySuperCsvCellProcessorException", which has a bit more of a non-technical friendly output format (no class names, accepts a fieldname concept). SuppressWarnings itself got a bit smarter. To begin with, it rewrites the CsvContext so it can accurately display error columns - those were broken in the version you posted to StackOverflow. I also made it so that it knows the fieldName it's working with, so it accepts that, and passes it on to the thrown exception. I invoked it a bit more dynamically, using a HashMap as my master store for the "other" validations I'm doing on fields, then do just a "theProcessors.add( new SuppressException( field, fieldMapping.get(field) ) );" to add each resulting CellProcessor wrapped in a new SuppressException as appropriate.

tweaked.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

James Bassett - 2013-03-02

I've added the 2 processors (now called Collector and IsElementOf) for the upcoming release.
[r278]

Related

Commit: [r278]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

James Bassett - 2013-03-02

status: open --> pending

milestone: Outstanding --> 2.1.0
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

James Bassett - 2013-04-24

Status: pending --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Referential Cell Processor

A fast, programmer-friendly, free CSV library for Java

Group

Searches

Help

#25 Referential Cell Processor

Discussion

Related