From: Noel O'B. <bao...@gm...> - 2008-05-22 15:55:32
|
2008/5/22 Nick England <nic...@gm...>: > Noel, > > I am trying to perform a substructure search on a large number of > molecules. I can do the SMARTS matching from python, but would need > the fast search ideally. Have you looked at the FastSearch class as suggested by Chris? Alternatively you could look at the work done by the MyChem and PGChem people who have also been working on the same problem. > Would it be possible for me to write a C++ class to call the > OBConversion with input and output streams, and call that from a > Python script instead? In theory that is possible. But I think you should look at the FastSearch class first. > Calling the babel executable directly from Python is an option, but I > believe that has issues with portability? No - it should work fine, but it's not really an elegant solution. You will need to hardcode the location of the babel executable or pick it up via an environment variable. You would use one of the popen methods to call it. > - Nick > > 2008/5/22 Noel O'Boyle <bao...@gm...>: >> There's no way to do streams in Python. That's why we have this whole >> ReadFile, Read, business. Can you describe what the overall problem is >> that you're working on, and maybe some of the wise heads on this list >> can suggest some alternative solutions... >> >> Noel >> >> 2008/5/22 Nick England <nic...@gm...>: >>> Chris, >>> >>> Would it be possible to do something like >>> >>> infile=open("input.smi","r") >>> outfile=open("index.fs","w") >>> obconversion = openbabel.OBConversion(infile,outfile) (although these >>> need to be wrapped into streams somehow, SWIG has a function for >>> this?) >>> obconversion.Convert() >>> >>> I am trying to be able to make and search an index from calls from a >>> scripting environment. >>> >>> Thanks, >>> >>> Nick >>> >>> >>> 2008/5/21 Chris Morley <c.m...@ds...>: >>>> When fs is used as an output format it makes an index of the input file. >>>> This is a list of the fingerprint of each molecule in the input file >>>> and the number of bytes it is from the beginning of that file. It wasn't >>>> written to be used like you are doing it, so the failure is not >>>> surprising. There is an API of C++ classes for fastsearch but even with >>>> scripting I think it would be easier to use the conversion framework to >>>> make an index (if that is what you are trying to do). On the command >>>> line it would be >>>> babel input.smi index.fs >>>> which would index all the molecules in input.smi >>>> To use the index >>>> babel index.fs -osmi -s"CO" >>>> which would display all the molecules match the SMARTS. >>>> >>>> The SMARTS actually has to be a valid SMILES (a molecular fragment) >>>> because in the searching its fingerprint is calculated (to be compared >>>> with all the fingerprints in the input file). >>>> >>>> If you are doing a simple SMARTS filter >>>> babel input.smi -osmi -s"[#6]O" >>>> you can use a full SMARTS. >>>> >>>> Coming back to scripting, the fastsearch only has any point if you do it >>>> all in one go, i.e. scan the index using compiled code. Looping in a >>>> scripting language is too slow. This probably applies also when >>>> indexing a file. Both would be best done by calling the conversion >>>> framework. I'm afraid I haven't thought how to recover the result >>>> molecules one by one. >>>> >>>> It may be that your best approach is not to use fastsearch and use an >>>> ordinary SMARTS search instead. From the command line this is probably >>>> the best way anyway for fewer than 10,000 molecules. >>>> >>>> Chris >>>> >>>> When >>>> Noel O'Boyle wrote: >>>>> Sounds like a bug. Could you file one? >>>>> >>>>> I didn't realise that 'fs' was a proper format. I always thought it >>>>> was just some sort of index used to search a large SDF file, or >>>>> something. >>>>> >>>>> Noel >>>>> >>>>> 2008/5/21 Nick England <nic...@gm...>: >>>>>> Noel, >>>>>> >>>>>> The "obconversion.SetInFormat("fs")" returns true, and the input file >>>>>> is valid in that it was made on this computer and works fine on the >>>>>> command line babel (both under linux) >>>>>> >>>>>> However, trying to do: >>>>>> >>>>>> import pybel >>>>>> allmols=[mol for mol in pybel.readfile("smi","input.smi")] >>>>>> smarts=pybel.Smarts("[#6]O") >>>>>> for mol in allmols: >>>>>> if(smarts.findall(mol)): >>>>>> print mol.write("fs") >>>>>> >>>>>> results in the mesage >>>>>> >>>>>> "Not a valid output format" >>>>>> >>>>>> >>>>>> using the interpreter: >>>>>>>>> obconversion.SetInFormat("fs") >>>>>> True >>>>>>>>> obconversion.SetOutFormat("fs") >>>>>> True >>>>>> >>>>>> This problem cannot be due to a problem with the input files, since it >>>>>> won't even output a simple CCCO smiles string to the fs format. The >>>>>> obconversion seems to understant the format though. >>>>>> >>>>>> >>>>>> 2008/5/21 Noel O'Boyle <bao...@gm...>: >>>>>>> 2008/5/21 Nick England <nic...@gm...>: >>>>>>>> Hello all, >>>>>>>> >>>>>>>> I am experiencing some odd behavoir with the python bindings. A simple >>>>>>>> program to read in an index file: >>>>>>>> >>>>>>>> #! /usr/bin/env python >>>>>>>> import openbabel >>>>>>>> import pybel >>>>>>>> allmols=[] >>>>>>>> obconversion = openbabel.OBConversion() >>>>>>>> obconversion.SetInFormat("fs") >>>>>>>> obmol = openbabel.OBMol() >>>>>>>> >>>>>>>> notatend = obconversion.ReadFile(obmol,"index.fs") >>>>>>>> >>>>>>>> while notatend: >>>>>>>> >>>>>>>> allmols.append(obmol) >>>>>>>> obmol=openbabel.OBMol() >>>>>>>> notatend=obconversion.Read(obmol) >>>>>>>> pybel.Molecule(obmol).write("smi","results.smi",True) >>>>>>>> >>>>>>>> is failing with the message: >>>>>>>> "Not a valid input format" >>>>>>> What line is failing? Try "success = obconversion.SetInFormat("fs")" >>>>>>> and check its value. >>>>>>> >>>>>>> If the value is True, then the problem is not setting the format, but >>>>>>> rather reading the file. Is the inputfile valid? Is it from the same >>>>>>> operating system? If so, it sounds like a bug. Can you file one and >>>>>>> provide the input file (use a short example, if possible). >>>>>>> >>>>>>>> However typing babel -Hfs and obconversion.GetSupportedInputFormat() >>>>>>>> both list the fastsearch format as being present. The command babel >>>>>>>> -ifs index.fs -osmi works fine, and the python program above works if >>>>>>>> the format isn't fs. >>>>>>>> I would also like to add the line >>>>>>>> obconversion.AddOption('s',openbabel.OBConversion::GENOPTIONS,searchstring) >>>>>>>> (but this is throwing a syntax error on the OBCoversion::GENOPTIONS >>>>>>>> part, this is my first attempt at using python however so its not >>>>>>>> unexpected!) >>>>>>> Try using the interactive Python prompt: >>>>>>>>>> dir(openbabel.OBConversion()) >>>>>>> ['ALL', 'AddChemObject', 'AddOption', 'CloseOutFile', 'Convert', 'CopyOptions', >>>>>>> ... >>>>>>> GENOPTIONS', 'GetAuxConv', 'GetChemObject', 'GetDefaultFormat', 'GetInFilename', >>>>>>> 'GetInFormat', 'GetInLen', 'GetInPos', 'GetInStream', 'GetOptionParams', 'GetOp >>>>>>> ... >>>>>>> 'thisown'] >>>>>>>>>> openbabel.OBConversion.GENOPTIONS >>>>>>> 2 >>>>>>> >>>>>>>> Any help would be appreciated! >>>>>>>> >>>>>>>> ------------------------------------------------------------------------- >>>>>>>> This SF.net email is sponsored by: Microsoft >>>>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>>>> _______________________________________________ >>>>>>>> OpenBabel-scripting mailing list >>>>>>>> Ope...@li... >>>>>>>> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >>>>>>>> >>>>> >>>>> ------------------------------------------------------------------------- >>>>> This SF.net email is sponsored by: Microsoft >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>> _______________________________________________ >>>>> OpenBabel-scripting mailing list >>>>> Ope...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >>>>> >>>> >>>> >>>> ------------------------------------------------------------------------- >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> OpenBabel-scripting mailing list >>>> Ope...@li... >>>> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >>>> >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> OpenBabel-scripting mailing list >>> Ope...@li... >>> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >>> >> > |