Menu

#117 Extract molecule by title

2.2
closed
5
2012-10-23
2007-07-06
No

Given a large SDF or MOL2 file, I'd like to be able to use babel to extract a molecule, or molecules, that match exactly a given title, or that match a regular expression.

babel already has SMARTS filtering. Could we have a similar feature for the title text?

(Note that this should perhaps be implemented in such a way that it can be easily extended to the data fields in a molecule)

Discussion

  • Chris Morley

    Chris Morley - 2007-07-06

    Logged In: YES
    user_id=1189615
    Originator: NO

    The --filter option in SVN HEAD (for v3.0) does much of this. See
    http://openbabel.sourceforge.net/wiki/--filter_option

    babel infile.sdf outfile.xxx --filter "title='TargetTitle'"

    It can have multiple boolean connected filter tests that may involve (sdf-like) properties (NAME in the example below is such a property):

    babel infile.sdf outfile.xxx --filter "(title='TargetTitle' || NAME='AltTitle') && MW >180"

    The string tests can be crudely alphabetical, title>'Tar', but it doesn't do regular expressions. Would many chemists be able to use a regular expression easily? (Speaking as one who couldn't.)

    Chris

     
  • Noel O'Boyle

    Noel O'Boyle - 2007-07-06

    Logged In: YES
    user_id=850620
    Originator: YES

    Great. That looks really nice. I knew that you had done some work on it, but I didn't see it in the help on 'babel'.

    Regarding regular expressions, I just mean an asterisk. Often all the actives (e.g. in drug docking) would come from a different database (e.g. drawn by hand) than the inactives, and can be simply identified by common elements in their title.

    Would it be possible to use an expression like "MY_FIRST_DATAFIELD>1.4 && MY_SECOND_DATAFIELD='NOT_IN_STOCK'". I note that a possible problem is data fields that contain spaces in the names. I think that you will have to enforce some rules such as, if there's a space in the name, use an underscore in the query. It would be good also if babel warned if no field matched one of the terms in the query (and perhaps printed out some possibilities), as I imagine that this will be a common error.

     
  • Chris Morley

    Chris Morley - 2007-07-07

    Logged In: YES
    user_id=1189615
    Originator: NO

    At your suggestion, I've now (rev2016) added support for:

    • as wildcard. It has to be the first or last character.

    property names containing spaces. You use underscores in the filter string.

    I originally had a warning message for unknown property names but removed it because it becomes tedious with files where some molecules have a property and others don't (like the filterset.sdf referred to inon the wiki page).

     
  • Geoff Hutchison

    Geoff Hutchison - 2008-02-24

    Logged In: YES
    user_id=21420
    Originator: NO

    As discussed below, this feature is now implemented. I'm closing this feature request. Thanks to Chris for adding this!

     
MongoDB Logo MongoDB