#640 SdfWriter: exclude property cdk:Title from being written to sdf

Accepted
closed
nobody
None
cdk-1.4.x
3
2013-08-02
2013-06-03
Joos Kiener
No

A Molecules property cdk:title (set with CDKConstans.Title) must be used to set Molecule Name in sd-file. However this is then also exported into the sd-file as a property which is undesired.

The proposed solution is to use white-listing for export properties. Only properties explicitly inputed into SdfWriter should be exported. This new behavior could be added with an extra constructor that takes a Set of properties. Using existing constructors keeps the old behavior.

When you are offering sd-files for download it is common to give the end-user the option of which properties to include. This selection could then be easily passed on to SdfWriter.

Discussion

  • Sounds reasonable... where is the patch?

     
  • Joos Kiener
    Joos Kiener
    2013-06-10

    see attachment.

    Additional field + constructors. Old behavior is the same.

     
    Attachments
  • OK, thanx. I converted it into a git patch under your name, and changed the propertiesToWrite field to not "final" which triggered compile errors in my Eclipse.

    But I like to have a second opinion... strictly speaking, we use IOOptions for this, and a IO option "FieldsToWrite" with a comma-separated list of fields would work for me slightly better, even though a bit trickier to implement.

    And, perhaps a simpler alternative would be to add cdk:Title to the things not to output as SD fields??

     
    Last edit: Egon Willighagen 2013-08-01
  • Joos Kiener
    Joos Kiener
    2013-08-02

    Yes, adding cdk:Title to the things not to output would be a good idea. But I do still think having the option to specify which fields to include in sd-file is a nice feature.

    Using a comma-separated list of fields would of course also work but for me that just has some kind of "bad taste" to it, it's not very "clean". Using such methods in a library would get me thinking why they are doing something like this instead of using a Set? It would not exactly increase my trust or opinion of said library.
    More importantly it's much easier to use a List / Set in the constructor and it is directly visible for a user, that this feature is available.

    My current suggestion would be to use a List instead of a Set simply because a list has a guaranteed order in which items are returned. Hence the user can also define in which order the properties are exported.

     
  • John May
    John May
    2013-08-02

    Hmm... I think SF ate my other comment. Anyways I think the patch looks good. I've added some unit tests (attached).

    I'm not really a fan of the IOOptions, to me it feels clunky for library interaction. It makes complete sense for a scripting environment where you want to know what options are available but..

    new SDFWriter(out, Sets.singleton("cdk:Title"));
    

    is much nicer than

    SDFWriter sdf = new SDFWriter(out);
    Properties prop = new Properties();
    prop.setProperty("FieldsToWrite","cdk:Title");
    PropertiesListener listener = new PropertiesListener(prop);
    sdf.addChemObjectIOListener(listener);
    sdf.customizeJob();
    

    Thinking a bit more abstract there is another alternative. It's too verbose in the current JDK though but would be nice in JDK 8 with the lambda sugar. This allows one to avoid the null = 'write all' which isn't very nice and can also accept based on a prefix/pattern and actually exclude rather than accept.

    // explicit anon class defines what properties are accepted, predicate from guava but this could be any named class
    new SDFWriter(out, new Predicate<String>() {
                            boolean apply(String prop) {
                                return prop.equals("cdk:Title");
                            }
                       });
    
    // now using some lambda sugar
    
    // accept only cdk:Title
    new SDFWriter(out, p -> p.equals("cdk:Title"));
    
    // reject cdk:Title
    new SDFWriter(out, p -> !p.equals("cdk:Title"));
    
    // accept all
    new SDFWriter(out, p -> true);
    
    // reject all
    new SDFWriter(out, p -> false);
    
    // reject all that start 'cdk:'
    new SDFWriter(out, p -> !p.startsWith("cdk:"));
    
    // using the set for a large whitelist
    final Set<String> whitelist = ...;
    new SDFWriter(out, whitelist::contains);
    
     
  • John May
    John May
    2013-08-02

    Oh also.. the original patch is applied and pushed. Just the unit tests need looking at.

     
    • status: open --> closed
    • Group: Needs_Review --> Accepted
     
  • Unit tests looks good and pushed too.

    I replied some original design goals on the cdk-devel@ mailing list.