Given a large SDF or MOL2 file, I'd like to be able to use babel to extract a molecule, or molecules, that match exactly a given title, or that match a regular expression.
babel already has SMARTS filtering. Could we have a similar feature for the title text?
(Note that this should perhaps be implemented in such a way that it can be easily extended to the data fields in a molecule)
Logged In: YES
user_id=1189615
Originator: NO
The --filter option in SVN HEAD (for v3.0) does much of this. See
http://openbabel.sourceforge.net/wiki/--filter_option
babel infile.sdf outfile.xxx --filter "title='TargetTitle'"
It can have multiple boolean connected filter tests that may involve (sdf-like) properties (NAME in the example below is such a property):
babel infile.sdf outfile.xxx --filter "(title='TargetTitle' || NAME='AltTitle') && MW >180"
The string tests can be crudely alphabetical, title>'Tar', but it doesn't do regular expressions. Would many chemists be able to use a regular expression easily? (Speaking as one who couldn't.)
Chris
Logged In: YES
user_id=850620
Originator: YES
Great. That looks really nice. I knew that you had done some work on it, but I didn't see it in the help on 'babel'.
Regarding regular expressions, I just mean an asterisk. Often all the actives (e.g. in drug docking) would come from a different database (e.g. drawn by hand) than the inactives, and can be simply identified by common elements in their title.
Would it be possible to use an expression like "MY_FIRST_DATAFIELD>1.4 && MY_SECOND_DATAFIELD='NOT_IN_STOCK'". I note that a possible problem is data fields that contain spaces in the names. I think that you will have to enforce some rules such as, if there's a space in the name, use an underscore in the query. It would be good also if babel warned if no field matched one of the terms in the query (and perhaps printed out some possibilities), as I imagine that this will be a common error.
Logged In: YES
user_id=1189615
Originator: NO
At your suggestion, I've now (rev2016) added support for:
property names containing spaces. You use underscores in the filter string.
I originally had a warning message for unknown property names but removed it because it becomes tedious with files where some molecules have a property and others don't (like the filterset.sdf referred to inon the wiki page).
Logged In: YES
user_id=21420
Originator: NO
As discussed below, this feature is now implemented. I'm closing this feature request. Thanks to Chris for adding this!