From: Noel O'B. <bao...@gm...> - 2008-05-22 15:55:32
|
2008/5/22 Nick England <nic...@gm...>: > Noel, > > I am trying to perform a substructure search on a large number of > molecules. I can do the SMARTS matching from python, but would need > the fast search ideally. Have you looked at the FastSearch class as suggested by Chris? Alternatively you could look at the work done by the MyChem and PGChem people who have also been working on the same problem. > Would it be possible for me to write a C++ class to call the > OBConversion with input and output streams, and call that from a > Python script instead? In theory that is possible. But I think you should look at the FastSearch class first. > Calling the babel executable directly from Python is an option, but I > believe that has issues with portability? No - it should work fine, but it's not really an elegant solution. You will need to hardcode the location of the babel executable or pick it up via an environment variable. You would use one of the popen methods to call it. > - Nick > > 2008/5/22 Noel O'Boyle <bao...@gm...>: >> There's no way to do streams in Python. That's why we have this whole >> ReadFile, Read, business. Can you describe what the overall problem is >> that you're working on, and maybe some of the wise heads on this list >> can suggest some alternative solutions... >> >> Noel >> >> 2008/5/22 Nick England <nic...@gm...>: >>> Chris, >>> >>> Would it be possible to do something like >>> >>> infile=open("input.smi","r") >>> outfile=open("index.fs","w") >>> obconversion = openbabel.OBConversion(infile,outfile) (although these >>> need to be wrapped into streams somehow, SWIG has a function for >>> this?) >>> obconversion.Convert() >>> >>> I am trying to be able to make and search an index from calls from a >>> scripting environment. >>> >>> Thanks, >>> >>> Nick >>> >>> >>> 2008/5/21 Chris Morley <c.m...@ds...>: >>>> When fs is used as an output format it makes an index of the input file. >>>> This is a list of the fingerprint of each molecule in the input file >>>> and the number of bytes it is from the beginning of that file. It wasn't >>>> written to be used like you are doing it, so the failure is not >>>> surprising. There is an API of C++ classes for fastsearch but even with >>>> scripting I think it would be easier to use the conversion framework to >>>> make an index (if that is what you are trying to do). On the command >>>> line it would be >>>> babel input.smi index.fs >>>> which would index all the molecules in input.smi >>>> To use the index >>>> babel index.fs -osmi -s"CO" >>>> which would display all the molecules match the SMARTS. >>>> >>>> The SMARTS actually has to be a valid SMILES (a molecular fragment) >>>> because in the searching its fingerprint is calculated (to be compared >>>> with all the fingerprints in the input file). >>>> >>>> If you are doing a simple SMARTS filter >>>> babel input.smi -osmi -s"[#6]O" >>>> you can use a full SMARTS. >>>> >>>> Coming back to scripting, the fastsearch only has any point if you do it >>>> all in one go, i.e. scan the index using compiled code. Looping in a >>>> scripting language is too slow. This probably applies also when >>>> indexing a file. Both would be best done by calling the conversion >>>> framework. I'm afraid I haven't thought how to recover the result >>>> molecules one by one. >>>> >>>> It may be that your best approach is not to use fastsearch and use an >>>> ordinary SMARTS search instead. From the command line this is probably >>>> the best way anyway for fewer than 10,000 molecules. >>>> >>>> Chris >>>> >>>> When >>>> Noel O'Boyle wrote: >>>>> Sounds like a bug. Could you file one? >>>>> >>>>> I didn't realise that 'fs' was a proper format. I always thought it >>>>> was just some sort of index used to search a large SDF file, or >>>>> something. >>>>> >>>>> Noel >>>>> >>>>> 2008/5/21 Nick England <nic...@gm...>: >>>>>> Noel, >>>>>> >>>>>> The "obconversion.SetInFormat("fs")" returns true, and the input file >>>>>> is valid in that it was made on this computer and works fine on the >>>>>> command line babel (both under linux) >>>>>> >>>>>> However, trying to do: >>>>>> >>>>>> import pybel >>>>>> allmols=[mol for mol in pybel.readfile("smi","input.smi")] >>>>>> smarts=pybel.Smarts("[#6]O") >>>>>> for mol in allmols: >>>>>> if(smarts.findall(mol)): >>>>>> print mol.write("fs") >>>>>> >>>>>> results in the mesage >>>>>> >>>>>> "Not a valid output format" >>>>>> >>>>>> >>>>>> using the interpreter: >>>>>>>>> obconversion.SetInFormat("fs") >>>>>> True >>>>>>>>> obconversion.SetOutFormat("fs") >>>>>> True >>>>>> >>>>>> This problem cannot be due to a problem with the input files, since it >>>>>> won't even output a simple CCCO smiles string to the fs format. The >>>>>> obconversion seems to understant the format though. >>>>>> >>>>>> >>>>>> 2008/5/21 Noel O'Boyle <bao...@gm...>: >>>>>>> 2008/5/21 Nick England <nic...@gm...>: >>>>>>>> Hello all, >>>>>>>> >>>>>>>> I am experiencing some odd behavoir with the python bindings. A simple >>>>>>>> program to read in an index file: >>>>>>>> >>>>>>>> #! /usr/bin/env python >>>>>>>> import openbabel >>>>>>>> import pybel >>>>>>>> allmols=[] >>>>>>>> obconversion = openbabel.OBConversion() >>>>>>>> obconversion.SetInFormat("fs") >>>>>>>> obmol = openbabel.OBMol() >>>>>>>> >>>>>>>> notatend = obconversion.ReadFile(obmol,"index.fs") >>>>>>>> >>>>>>>> while notatend: >>>>>>>> >>>>>>>> allmols.append(obmol) >>>>>>>> obmol=openbabel.OBMol() >>>>>>>> notatend=obconversion.Read(obmol) >>>>>>>> pybel.Molecule(obmol).write("smi","results.smi",True) >>>>>>>> >>>>>>>> is failing with the message: >>>>>>>> "Not a valid input format" >>>>>>> What line is failing? Try "success = obconversion.SetInFormat("fs")" >>>>>>> and check its value. >>>>>>> >>>>>>> If the value is True, then the problem is not setting the format, but >>>>>>> rather reading the file. Is the inputfile valid? Is it from the same >>>>>>> operating system? If so, it sounds like a bug. Can you file one and >>>>>>> provide the input file (use a short example, if possible). >>>>>>> >>>>>>>> However typing babel -Hfs and obconversion.GetSupportedInputFormat() >>>>>>>> both list the fastsearch format as being present. The command babel >>>>>>>> -ifs index.fs -osmi works fine, and the python program above works if >>>>>>>> the format isn't fs. >>>>>>>> I would also like to add the line >>>>>>>> obconversion.AddOption('s',openbabel.OBConversion::GENOPTIONS,searchstring) >>>>>>>> (but this is throwing a syntax error on the OBCoversion::GENOPTIONS >>>>>>>> part, this is my first attempt at using python however so its not >>>>>>>> unexpected!) >>>>>>> Try using the interactive Python prompt: >>>>>>>>>> dir(openbabel.OBConversion()) >>>>>>> ['ALL', 'AddChemObject', 'AddOption', 'CloseOutFile', 'Convert', 'CopyOptions', >>>>>>> ... >>>>>>> GENOPTIONS', 'GetAuxConv', 'GetChemObject', 'GetDefaultFormat', 'GetInFilename', >>>>>>> 'GetInFormat', 'GetInLen', 'GetInPos', 'GetInStream', 'GetOptionParams', 'GetOp >>>>>>> ... >>>>>>> 'thisown'] >>>>>>>>>> openbabel.OBConversion.GENOPTIONS >>>>>>> 2 >>>>>>> >>>>>>>> Any help would be appreciated! >>>>>>>> >>>>>>>> ------------------------------------------------------------------------- >>>>>>>> This SF.net email is sponsored by: Microsoft >>>>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>>>> _______________________________________________ >>>>>>>> OpenBabel-scripting mailing list >>>>>>>> Ope...@li... >>>>>>>> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >>>>>>>> >>>>> >>>>> ------------------------------------------------------------------------- >>>>> This SF.net email is sponsored by: Microsoft >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>> _______________________________________________ >>>>> OpenBabel-scripting mailing list >>>>> Ope...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >>>>> >>>> >>>> >>>> ------------------------------------------------------------------------- >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> OpenBabel-scripting mailing list >>>> Ope...@li... >>>> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >>>> >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> OpenBabel-scripting mailing list >>> Ope...@li... >>> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >>> >> > |
From: Chris M. <c.m...@ds...> - 2008-05-23 09:05:52
|
I think Nick's suggestion is a good idea. There could be a C++ function bool OBConversion::OpenInAndOutFiles(std::string infilename, std::string outfilename); which could be easily used from scripting languages, but the file handling would be done in the C++. So in Python you would make an index something like: conv = ob.OBConversion() conv.OpenInAndOutFiles("input.smi", "index.fs") conv.SetInAndOutFormats("smi","fs") conv.Convert() Currently ReadFile allows only those formats that are accessed through ReadMolecule rather than ReadChemObject as fastsearch is, but the above mod would permit this. (All the above would be analogous for output.) It means all input and output are only to files, but maybe there could be an a similar OpenInAndOutStrings(). The scripting code could be even shorter if the setting the formats was also be done in the proposed new function from the file extensions, although the return code would need to flag more than one type of error. Does that feel right in Python? Would this be a worthwhile addition? Chris Noel O'Boyle wrote: > 2008/5/22 Nick England <nic...@gm...>: >> Noel, >> >> I am trying to perform a substructure search on a large number of >> molecules. I can do the SMARTS matching from python, but would need >> the fast search ideally. > Have you looked at the FastSearch class as suggested by Chris? > Alternatively you could look at the work done by the MyChem and PGChem > people who have also been working on the same problem. > >> Would it be possible for me to write a C++ class to call the >> OBConversion with input and output streams, and call that from a >> Python script instead? > In theory that is possible. But I think you should look at the > FastSearch class first. > >> Calling the babel executable directly from Python is an option, but I >> believe that has issues with portability? > No - it should work fine, but it's not really an elegant solution. You > will need to hardcode the location of the babel executable or pick it > up via an environment variable. You would use one of the popen methods > to call it. > >> - Nick >> >> 2008/5/22 Noel O'Boyle <bao...@gm...>: >>> There's no way to do streams in Python. That's why we have this whole >>> ReadFile, Read, business. Can you describe what the overall problem is >>> that you're working on, and maybe some of the wise heads on this list >>> can suggest some alternative solutions... >>> >>> Noel >>> >>> 2008/5/22 Nick England <nic...@gm...>: >>>> Chris, >>>> >>>> Would it be possible to do something like >>>> >>>> infile=open("input.smi","r") >>>> outfile=open("index.fs","w") >>>> obconversion = openbabel.OBConversion(infile,outfile) (although these >>>> need to be wrapped into streams somehow, SWIG has a function for >>>> this?) >>>> obconversion.Convert() >>>> >>>> I am trying to be able to make and search an index from calls from a >>>> scripting environment. >>>> >>>> Thanks, >>>> >>>> Nick >>>> >>>> >>>> 2008/5/21 Chris Morley <c.m...@ds...>: >>>>> When fs is used as an output format it makes an index of the input file. >>>>> This is a list of the fingerprint of each molecule in the input file >>>>> and the number of bytes it is from the beginning of that file. It wasn't >>>>> written to be used like you are doing it, so the failure is not >>>>> surprising. There is an API of C++ classes for fastsearch but even with >>>>> scripting I think it would be easier to use the conversion framework to >>>>> make an index (if that is what you are trying to do). On the command >>>>> line it would be >>>>> babel input.smi index.fs >>>>> which would index all the molecules in input.smi >>>>> To use the index >>>>> babel index.fs -osmi -s"CO" >>>>> which would display all the molecules match the SMARTS. >>>>> >>>>> The SMARTS actually has to be a valid SMILES (a molecular fragment) >>>>> because in the searching its fingerprint is calculated (to be compared >>>>> with all the fingerprints in the input file). >>>>> >>>>> If you are doing a simple SMARTS filter >>>>> babel input.smi -osmi -s"[#6]O" >>>>> you can use a full SMARTS. >>>>> >>>>> Coming back to scripting, the fastsearch only has any point if you do it >>>>> all in one go, i.e. scan the index using compiled code. Looping in a >>>>> scripting language is too slow. This probably applies also when >>>>> indexing a file. Both would be best done by calling the conversion >>>>> framework. I'm afraid I haven't thought how to recover the result >>>>> molecules one by one. >>>>> >>>>> It may be that your best approach is not to use fastsearch and use an >>>>> ordinary SMARTS search instead. From the command line this is probably >>>>> the best way anyway for fewer than 10,000 molecules. >>>>> >>>>> Chris >>>>> >>>>> When >>>>> Noel O'Boyle wrote: >>>>>> Sounds like a bug. Could you file one? >>>>>> >>>>>> I didn't realise that 'fs' was a proper format. I always thought it >>>>>> was just some sort of index used to search a large SDF file, or >>>>>> something. >>>>>> >>>>>> Noel >>>>>> >>>>>> 2008/5/21 Nick England <nic...@gm...>: >>>>>>> Noel, >>>>>>> >>>>>>> The "obconversion.SetInFormat("fs")" returns true, and the input file >>>>>>> is valid in that it was made on this computer and works fine on the >>>>>>> command line babel (both under linux) >>>>>>> >>>>>>> However, trying to do: >>>>>>> >>>>>>> import pybel >>>>>>> allmols=[mol for mol in pybel.readfile("smi","input.smi")] >>>>>>> smarts=pybel.Smarts("[#6]O") >>>>>>> for mol in allmols: >>>>>>> if(smarts.findall(mol)): >>>>>>> print mol.write("fs") >>>>>>> >>>>>>> results in the mesage >>>>>>> >>>>>>> "Not a valid output format" >>>>>>> >>>>>>> >>>>>>> using the interpreter: >>>>>>>>>> obconversion.SetInFormat("fs") >>>>>>> True >>>>>>>>>> obconversion.SetOutFormat("fs") >>>>>>> True >>>>>>> >>>>>>> This problem cannot be due to a problem with the input files, since it >>>>>>> won't even output a simple CCCO smiles string to the fs format. The >>>>>>> obconversion seems to understant the format though. >>>>>>> >>>>>>> >>>>>>> 2008/5/21 Noel O'Boyle <bao...@gm...>: >>>>>>>> 2008/5/21 Nick England <nic...@gm...>: >>>>>>>>> Hello all, >>>>>>>>> >>>>>>>>> I am experiencing some odd behavoir with the python bindings. A simple >>>>>>>>> program to read in an index file: >>>>>>>>> >>>>>>>>> #! /usr/bin/env python >>>>>>>>> import openbabel >>>>>>>>> import pybel >>>>>>>>> allmols=[] >>>>>>>>> obconversion = openbabel.OBConversion() >>>>>>>>> obconversion.SetInFormat("fs") >>>>>>>>> obmol = openbabel.OBMol() >>>>>>>>> >>>>>>>>> notatend = obconversion.ReadFile(obmol,"index.fs") >>>>>>>>> >>>>>>>>> while notatend: >>>>>>>>> >>>>>>>>> allmols.append(obmol) >>>>>>>>> obmol=openbabel.OBMol() >>>>>>>>> notatend=obconversion.Read(obmol) >>>>>>>>> pybel.Molecule(obmol).write("smi","results.smi",True) >>>>>>>>> >>>>>>>>> is failing with the message: >>>>>>>>> "Not a valid input format" >>>>>>>> What line is failing? Try "success = obconversion.SetInFormat("fs")" >>>>>>>> and check its value. >>>>>>>> >>>>>>>> If the value is True, then the problem is not setting the format, but >>>>>>>> rather reading the file. Is the inputfile valid? Is it from the same >>>>>>>> operating system? If so, it sounds like a bug. Can you file one and >>>>>>>> provide the input file (use a short example, if possible). >>>>>>>> >>>>>>>>> However typing babel -Hfs and obconversion.GetSupportedInputFormat() >>>>>>>>> both list the fastsearch format as being present. The command babel >>>>>>>>> -ifs index.fs -osmi works fine, and the python program above works if >>>>>>>>> the format isn't fs. >>>>>>>>> I would also like to add the line >>>>>>>>> obconversion.AddOption('s',openbabel.OBConversion::GENOPTIONS,searchstring) >>>>>>>>> (but this is throwing a syntax error on the OBCoversion::GENOPTIONS >>>>>>>>> part, this is my first attempt at using python however so its not >>>>>>>>> unexpected!) >>>>>>>> Try using the interactive Python prompt: >>>>>>>>>>> dir(openbabel.OBConversion()) >>>>>>>> ['ALL', 'AddChemObject', 'AddOption', 'CloseOutFile', 'Convert', 'CopyOptions', >>>>>>>> ... >>>>>>>> GENOPTIONS', 'GetAuxConv', 'GetChemObject', 'GetDefaultFormat', 'GetInFilename', >>>>>>>> 'GetInFormat', 'GetInLen', 'GetInPos', 'GetInStream', 'GetOptionParams', 'GetOp >>>>>>>> ... >>>>>>>> 'thisown'] >>>>>>>>>>> openbabel.OBConversion.GENOPTIONS >>>>>>>> 2 >>>>>>>> >>>>>>>>> Any help would be appreciated! >>>>>>>>> >>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>> This SF.net email is sponsored by: Microsoft >>>>>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>>>>> _______________________________________________ >>>>>>>>> OpenBabel-scripting mailing list >>>>>>>>> Ope...@li... >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >>>>>>>>> >>>>>> ------------------------------------------------------------------------- >>>>>> This SF.net email is sponsored by: Microsoft >>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>> _______________________________________________ >>>>>> OpenBabel-scripting mailing list >>>>>> Ope...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >>>>>> >>>>> >>>>> ------------------------------------------------------------------------- >>>>> This SF.net email is sponsored by: Microsoft >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>> _______________________________________________ >>>>> OpenBabel-scripting mailing list >>>>> Ope...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >>>>> >>>> ------------------------------------------------------------------------- >>>> This SF.net email is sponsored by: Microsoft >>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>> _______________________________________________ >>>> OpenBabel-scripting mailing list >>>> Ope...@li... >>>> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >>>> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > OpenBabel-scripting mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/openbabel-scripting > |
From: Noel O'B. <bao...@gm...> - 2008-05-23 09:35:07
|
2008/5/23 Chris Morley <c.m...@ds...>: > I think Nick's suggestion is a good idea. > There could be a C++ function > > bool OBConversion::OpenInAndOutFiles(std::string infilename, std::string > outfilename); > > which could be easily used from scripting languages, but the file > handling would be done in the C++. So in Python you would make an index > something like: > > conv = ob.OBConversion() > conv.OpenInAndOutFiles("input.smi", "index.fs") > conv.SetInAndOutFormats("smi","fs") > conv.Convert() > > Currently ReadFile allows only those formats that are accessed through > ReadMolecule rather than ReadChemObject as fastsearch is, but the above > mod would permit this. (All the above would be analogous for output.) It > means all input and output are only to files, but maybe there could be > an a similar OpenInAndOutStrings(). Strings can be handled into the regular way though. All we need is one way to do things :-) > The scripting code could be even shorter if the setting the formats was > also be done in the proposed new function from the file extensions, > although the return code would need to flag more than one type of error. > Does that feel right in Python? Having two similar but different APIs is confusing. It's better to leave SetInAndOutFormats outside the new function. > Would this be a worthwhile addition? For Nick, yes :-) > Chris > > Noel O'Boyle wrote: >> 2008/5/22 Nick England <nic...@gm...>: >>> Noel, >>> >>> I am trying to perform a substructure search on a large number of >>> molecules. I can do the SMARTS matching from python, but would need >>> the fast search ideally. >> Have you looked at the FastSearch class as suggested by Chris? >> Alternatively you could look at the work done by the MyChem and PGChem >> people who have also been working on the same problem. >> >>> Would it be possible for me to write a C++ class to call the >>> OBConversion with input and output streams, and call that from a >>> Python script instead? >> In theory that is possible. But I think you should look at the >> FastSearch class first. >> >>> Calling the babel executable directly from Python is an option, but I >>> believe that has issues with portability? >> No - it should work fine, but it's not really an elegant solution. You >> will need to hardcode the location of the babel executable or pick it >> up via an environment variable. You would use one of the popen methods >> to call it. >> >>> - Nick >>> >>> 2008/5/22 Noel O'Boyle <bao...@gm...>: >>>> There's no way to do streams in Python. That's why we have this whole >>>> ReadFile, Read, business. Can you describe what the overall problem is >>>> that you're working on, and maybe some of the wise heads on this list >>>> can suggest some alternative solutions... >>>> >>>> Noel >>>> >>>> 2008/5/22 Nick England <nic...@gm...>: >>>>> Chris, >>>>> >>>>> Would it be possible to do something like >>>>> >>>>> infile=open("input.smi","r") >>>>> outfile=open("index.fs","w") >>>>> obconversion = openbabel.OBConversion(infile,outfile) (although these >>>>> need to be wrapped into streams somehow, SWIG has a function for >>>>> this?) >>>>> obconversion.Convert() >>>>> >>>>> I am trying to be able to make and search an index from calls from a >>>>> scripting environment. >>>>> >>>>> Thanks, >>>>> >>>>> Nick >>>>> >>>>> >>>>> 2008/5/21 Chris Morley <c.m...@ds...>: >>>>>> When fs is used as an output format it makes an index of the input file. >>>>>> This is a list of the fingerprint of each molecule in the input file >>>>>> and the number of bytes it is from the beginning of that file. It wasn't >>>>>> written to be used like you are doing it, so the failure is not >>>>>> surprising. There is an API of C++ classes for fastsearch but even with >>>>>> scripting I think it would be easier to use the conversion framework to >>>>>> make an index (if that is what you are trying to do). On the command >>>>>> line it would be >>>>>> babel input.smi index.fs >>>>>> which would index all the molecules in input.smi >>>>>> To use the index >>>>>> babel index.fs -osmi -s"CO" >>>>>> which would display all the molecules match the SMARTS. >>>>>> >>>>>> The SMARTS actually has to be a valid SMILES (a molecular fragment) >>>>>> because in the searching its fingerprint is calculated (to be compared >>>>>> with all the fingerprints in the input file). >>>>>> >>>>>> If you are doing a simple SMARTS filter >>>>>> babel input.smi -osmi -s"[#6]O" >>>>>> you can use a full SMARTS. >>>>>> >>>>>> Coming back to scripting, the fastsearch only has any point if you do it >>>>>> all in one go, i.e. scan the index using compiled code. Looping in a >>>>>> scripting language is too slow. This probably applies also when >>>>>> indexing a file. Both would be best done by calling the conversion >>>>>> framework. I'm afraid I haven't thought how to recover the result >>>>>> molecules one by one. >>>>>> >>>>>> It may be that your best approach is not to use fastsearch and use an >>>>>> ordinary SMARTS search instead. From the command line this is probably >>>>>> the best way anyway for fewer than 10,000 molecules. >>>>>> >>>>>> Chris >>>>>> >>>>>> When >>>>>> Noel O'Boyle wrote: >>>>>>> Sounds like a bug. Could you file one? >>>>>>> >>>>>>> I didn't realise that 'fs' was a proper format. I always thought it >>>>>>> was just some sort of index used to search a large SDF file, or >>>>>>> something. >>>>>>> >>>>>>> Noel >>>>>>> >>>>>>> 2008/5/21 Nick England <nic...@gm...>: >>>>>>>> Noel, >>>>>>>> >>>>>>>> The "obconversion.SetInFormat("fs")" returns true, and the input file >>>>>>>> is valid in that it was made on this computer and works fine on the >>>>>>>> command line babel (both under linux) >>>>>>>> >>>>>>>> However, trying to do: >>>>>>>> >>>>>>>> import pybel >>>>>>>> allmols=[mol for mol in pybel.readfile("smi","input.smi")] >>>>>>>> smarts=pybel.Smarts("[#6]O") >>>>>>>> for mol in allmols: >>>>>>>> if(smarts.findall(mol)): >>>>>>>> print mol.write("fs") >>>>>>>> >>>>>>>> results in the mesage >>>>>>>> >>>>>>>> "Not a valid output format" >>>>>>>> >>>>>>>> >>>>>>>> using the interpreter: >>>>>>>>>>> obconversion.SetInFormat("fs") >>>>>>>> True >>>>>>>>>>> obconversion.SetOutFormat("fs") >>>>>>>> True >>>>>>>> >>>>>>>> This problem cannot be due to a problem with the input files, since it >>>>>>>> won't even output a simple CCCO smiles string to the fs format. The >>>>>>>> obconversion seems to understant the format though. >>>>>>>> >>>>>>>> >>>>>>>> 2008/5/21 Noel O'Boyle <bao...@gm...>: >>>>>>>>> 2008/5/21 Nick England <nic...@gm...>: >>>>>>>>>> Hello all, >>>>>>>>>> >>>>>>>>>> I am experiencing some odd behavoir with the python bindings. A simple >>>>>>>>>> program to read in an index file: >>>>>>>>>> >>>>>>>>>> #! /usr/bin/env python >>>>>>>>>> import openbabel >>>>>>>>>> import pybel >>>>>>>>>> allmols=[] >>>>>>>>>> obconversion = openbabel.OBConversion() >>>>>>>>>> obconversion.SetInFormat("fs") >>>>>>>>>> obmol = openbabel.OBMol() >>>>>>>>>> >>>>>>>>>> notatend = obconversion.ReadFile(obmol,"index.fs") >>>>>>>>>> >>>>>>>>>> while notatend: >>>>>>>>>> >>>>>>>>>> allmols.append(obmol) >>>>>>>>>> obmol=openbabel.OBMol() >>>>>>>>>> notatend=obconversion.Read(obmol) >>>>>>>>>> pybel.Molecule(obmol).write("smi","results.smi",True) >>>>>>>>>> >>>>>>>>>> is failing with the message: >>>>>>>>>> "Not a valid input format" >>>>>>>>> What line is failing? Try "success = obconversion.SetInFormat("fs")" >>>>>>>>> and check its value. >>>>>>>>> >>>>>>>>> If the value is True, then the problem is not setting the format, but >>>>>>>>> rather reading the file. Is the inputfile valid? Is it from the same >>>>>>>>> operating system? If so, it sounds like a bug. Can you file one and >>>>>>>>> provide the input file (use a short example, if possible). >>>>>>>>> >>>>>>>>>> However typing babel -Hfs and obconversion.GetSupportedInputFormat() >>>>>>>>>> both list the fastsearch format as being present. The command babel >>>>>>>>>> -ifs index.fs -osmi works fine, and the python program above works if >>>>>>>>>> the format isn't fs. >>>>>>>>>> I would also like to add the line >>>>>>>>>> obconversion.AddOption('s',openbabel.OBConversion::GENOPTIONS,searchstring) >>>>>>>>>> (but this is throwing a syntax error on the OBCoversion::GENOPTIONS >>>>>>>>>> part, this is my first attempt at using python however so its not >>>>>>>>>> unexpected!) >>>>>>>>> Try using the interactive Python prompt: >>>>>>>>>>>> dir(openbabel.OBConversion()) >>>>>>>>> ['ALL', 'AddChemObject', 'AddOption', 'CloseOutFile', 'Convert', 'CopyOptions', >>>>>>>>> ... >>>>>>>>> GENOPTIONS', 'GetAuxConv', 'GetChemObject', 'GetDefaultFormat', 'GetInFilename', >>>>>>>>> 'GetInFormat', 'GetInLen', 'GetInPos', 'GetInStream', 'GetOptionParams', 'GetOp >>>>>>>>> ... >>>>>>>>> 'thisown'] >>>>>>>>>>>> openbabel.OBConversion.GENOPTIONS >>>>>>>>> 2 >>>>>>>>> >>>>>>>>>> Any help would be appreciated! >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------------- >>>>>>>>>> This SF.net email is sponsored by: Microsoft >>>>>>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>>>>>> _______________________________________________ >>>>>>>>>> OpenBabel-scripting mailing list >>>>>>>>>> Ope...@li... >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >>>>>>>>>> >>>>>>> ------------------------------------------------------------------------- >>>>>>> This SF.net email is sponsored by: Microsoft >>>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>>> _______________________________________________ >>>>>>> OpenBabel-scripting mailing list >>>>>>> Ope...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >>>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------- >>>>>> This SF.net email is sponsored by: Microsoft >>>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>>> _______________________________________________ >>>>>> OpenBabel-scripting mailing list >>>>>> Ope...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >>>>>> >>>>> ------------------------------------------------------------------------- >>>>> This SF.net email is sponsored by: Microsoft >>>>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>>>> _______________________________________________ >>>>> OpenBabel-scripting mailing list >>>>> Ope...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >>>>> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> OpenBabel-scripting mailing list >> Ope...@li... >> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >> > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > OpenBabel-scripting mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/openbabel-scripting > |
From: Nick E. <nic...@gm...> - 2008-05-23 10:56:50
|
2008/5/23 Noel O'Boyle <bao...@gm...>: > 2008/5/23 Chris Morley <c.m...@ds...>: >> I think Nick's suggestion is a good idea. >> There could be a C++ function >> >> bool OBConversion::OpenInAndOutFiles(std::string infilename, std::string >> outfilename); >> >> which could be easily used from scripting languages, but the file >> handling would be done in the C++. This would certainly solve my problem, and would hopefully be useful to other people who want to use fastsearch from scripting languages as well. - Nick |
From: Chris M. <c.m...@ds...> - 2008-05-23 16:59:32
|
Nick England wrote: > 2008/5/23 Noel O'Boyle <bao...@gm...>: >> 2008/5/23 Chris Morley <c.m...@ds...>: >>> I think Nick's suggestion is a good idea. >>> There could be a C++ function >>> >>> bool OBConversion::OpenInAndOutFiles(std::string infilename, std::string >>> outfilename); I've now added this to the development code. Here it is making an index and then substructure searching it. (Note that my Python is rather shaky.) I don't think the CloseOutFile() is necessary here but may be sometimes. It is a bit tedious compared with the babel command line, but maybe Pybel could help? >>> import openbabel >>> conv=openbabel.OBConversion() >>> conv.OpenInAndOutFiles("1200mols.smi","index.fs") True >>> conv.SetInAndOutFormats("smi","fs") True >>> conv.Convert() This will prepare an index of 1200mols.smi and may take some time... It took 6 seconds 1192 >>> conv.CloseOutFile() >>> conv.OpenInAndOutFiles("index.fs","results.smi") True >>> conv.SetInAndOutFormats("fs","smi") True >>> conv.AddOption("s",conv.GENOPTIONS,"C=CC#N") >>> conv.Convert() 10 candidates from fingerprint search phase 1202 >>> f=open("results.smi") >>> f.read() 'OC(=O)C(=Cc1ccccc1)C#N\t298\nN#CC(=Cc1ccccc1)C#N\t490\nO=N(=O)c1cc(ccc1)C=C(C#N )C#N\t491\nClc1ccc(cc1)C=C(C#N)C#N\t492\nClc1ccc(c(c1)Cl)C=C(C#N)C#N\t493\nClc1c cc(cc1Cl)C=C(C#N)C#N\t494\nBrc1ccc(cc1)C=C(C#N)C#N\t532\nClc1ccccc1C=C(C#N)C#N\t 542\nN#CC(=CC=Cc1occc1)C#N\t548\nCCOC(=O)C(C#N)=C(C)C\t1074\n' >>> |
From: Nick E. <nic...@gm...> - 2008-05-27 12:44:54
|
Thanks Chris, When testing this new SVN version, I get the following error when trying to build the python bindings with python setup.py build file openbabel.py (for module openbabel) not found file openbabel.py (for module openbabel) not found running build_ext building '_openbabel' extension gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I../../include -I/home/nwe23/bin/include/python2.5 -c openbabel_python.cpp -o build/temp.linux-x86_64-2.5/openbabel_python.o gcc: openbabel_python.cpp: No such file or directory gcc: no input files error: command 'gcc' failed with exit status 1 openbabel.py openbabel_python.cpp are missing from the /scripts/python dir in the SVN repository, but are listed in the MANIFEST file. Do these need to be built from SWIG or something? - Nick 2008/5/23 Chris Morley <c.m...@ds...>: > Nick England wrote: >> 2008/5/23 Noel O'Boyle <bao...@gm...>: >>> 2008/5/23 Chris Morley <c.m...@ds...>: >>>> I think Nick's suggestion is a good idea. >>>> There could be a C++ function >>>> >>>> bool OBConversion::OpenInAndOutFiles(std::string infilename, std::string >>>> outfilename); > > I've now added this to the development code. Here it is making an index > and then substructure searching it. (Note that my Python is rather shaky.) > I don't think the CloseOutFile() is necessary here but may be sometimes. > It is a bit tedious compared with the babel command line, but maybe > Pybel could help? > > >>> import openbabel > >>> conv=openbabel.OBConversion() > >>> conv.OpenInAndOutFiles("1200mols.smi","index.fs") > True > >>> conv.SetInAndOutFormats("smi","fs") > True > >>> conv.Convert() > This will prepare an index of 1200mols.smi and may take some time... > It took 6 seconds > 1192 > >>> conv.CloseOutFile() > >>> conv.OpenInAndOutFiles("index.fs","results.smi") > True > >>> conv.SetInAndOutFormats("fs","smi") > True > >>> conv.AddOption("s",conv.GENOPTIONS,"C=CC#N") > >>> conv.Convert() > 10 candidates from fingerprint search phase > 1202 > >>> f=open("results.smi") > >>> f.read() > 'OC(=O)C(=Cc1ccccc1)C#N\t298\nN#CC(=Cc1ccccc1)C#N\t490\nO=N(=O)c1cc(ccc1)C=C(C#N > )C#N\t491\nClc1ccc(cc1)C=C(C#N)C#N\t492\nClc1ccc(c(c1)Cl)C=C(C#N)C#N\t493\nClc1c > cc(cc1Cl)C=C(C#N)C#N\t494\nBrc1ccc(cc1)C=C(C#N)C#N\t532\nClc1ccccc1C=C(C#N)C#N\t > 542\nN#CC(=CC=Cc1occc1)C#N\t548\nCCOC(=O)C(C#N)=C(C)C\t1074\n' > >>> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > OpenBabel-scripting mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/openbabel-scripting > |
From: Noel O'B. <bao...@gm...> - 2008-05-27 12:53:43
|
See http://openbabel.org/wiki/Install_%28source_code%29: """ If you are compiling directly from the Subversion repository, then you need to create the Python/Perl bindings yourself. To do so, you need to install the latest version of SWIG and to run configure as "configure --enable-maintainer-mode" """ It should work then, Noel 2008/5/27 Nick England <nic...@gm...>: > Thanks Chris, > > When testing this new SVN version, I get the following error when > trying to build the python bindings with > > python setup.py build > > file openbabel.py (for module openbabel) not found > file openbabel.py (for module openbabel) not found > running build_ext > building '_openbabel' extension > gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall > -Wstrict-prototypes -fPIC -I../../include > -I/home/nwe23/bin/include/python2.5 -c openbabel_python.cpp -o > build/temp.linux-x86_64-2.5/openbabel_python.o > gcc: openbabel_python.cpp: No such file or directory > gcc: no input files > error: command 'gcc' failed with exit status 1 > > openbabel.py > openbabel_python.cpp > > are missing from the /scripts/python dir in the SVN repository, but > are listed in the MANIFEST file. > > Do these need to be built from SWIG or something? > > - Nick > > 2008/5/23 Chris Morley <c.m...@ds...>: >> Nick England wrote: >>> 2008/5/23 Noel O'Boyle <bao...@gm...>: >>>> 2008/5/23 Chris Morley <c.m...@ds...>: >>>>> I think Nick's suggestion is a good idea. >>>>> There could be a C++ function >>>>> >>>>> bool OBConversion::OpenInAndOutFiles(std::string infilename, std::string >>>>> outfilename); >> >> I've now added this to the development code. Here it is making an index >> and then substructure searching it. (Note that my Python is rather shaky.) >> I don't think the CloseOutFile() is necessary here but may be sometimes. >> It is a bit tedious compared with the babel command line, but maybe >> Pybel could help? >> >> >>> import openbabel >> >>> conv=openbabel.OBConversion() >> >>> conv.OpenInAndOutFiles("1200mols.smi","index.fs") >> True >> >>> conv.SetInAndOutFormats("smi","fs") >> True >> >>> conv.Convert() >> This will prepare an index of 1200mols.smi and may take some time... >> It took 6 seconds >> 1192 >> >>> conv.CloseOutFile() >> >>> conv.OpenInAndOutFiles("index.fs","results.smi") >> True >> >>> conv.SetInAndOutFormats("fs","smi") >> True >> >>> conv.AddOption("s",conv.GENOPTIONS,"C=CC#N") >> >>> conv.Convert() >> 10 candidates from fingerprint search phase >> 1202 >> >>> f=open("results.smi") >> >>> f.read() >> 'OC(=O)C(=Cc1ccccc1)C#N\t298\nN#CC(=Cc1ccccc1)C#N\t490\nO=N(=O)c1cc(ccc1)C=C(C#N >> )C#N\t491\nClc1ccc(cc1)C=C(C#N)C#N\t492\nClc1ccc(c(c1)Cl)C=C(C#N)C#N\t493\nClc1c >> cc(cc1Cl)C=C(C#N)C#N\t494\nBrc1ccc(cc1)C=C(C#N)C#N\t532\nClc1ccccc1C=C(C#N)C#N\t >> 542\nN#CC(=CC=Cc1occc1)C#N\t548\nCCOC(=O)C(C#N)=C(C)C\t1074\n' >> >>> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> OpenBabel-scripting mailing list >> Ope...@li... >> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > OpenBabel-scripting mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/openbabel-scripting > |
From: Nick E. <nic...@gm...> - 2008-05-27 14:01:07
|
Thanks Noel, I missed that line of instruction works perfectly now! 2008/5/27 Noel O'Boyle <bao...@gm...>: > See http://openbabel.org/wiki/Install_%28source_code%29: > """ > If you are compiling directly from the Subversion repository, then you > need to create the Python/Perl bindings yourself. To do so, you need > to install the latest version of SWIG and to run configure as > "configure --enable-maintainer-mode" > """ > > It should work then, > > Noel > > 2008/5/27 Nick England <nic...@gm...>: >> Thanks Chris, >> >> When testing this new SVN version, I get the following error when >> trying to build the python bindings with >> >> python setup.py build >> >> file openbabel.py (for module openbabel) not found >> file openbabel.py (for module openbabel) not found >> running build_ext >> building '_openbabel' extension >> gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall >> -Wstrict-prototypes -fPIC -I../../include >> -I/home/nwe23/bin/include/python2.5 -c openbabel_python.cpp -o >> build/temp.linux-x86_64-2.5/openbabel_python.o >> gcc: openbabel_python.cpp: No such file or directory >> gcc: no input files >> error: command 'gcc' failed with exit status 1 >> >> openbabel.py >> openbabel_python.cpp >> >> are missing from the /scripts/python dir in the SVN repository, but >> are listed in the MANIFEST file. >> >> Do these need to be built from SWIG or something? >> >> - Nick >> >> 2008/5/23 Chris Morley <c.m...@ds...>: >>> Nick England wrote: >>>> 2008/5/23 Noel O'Boyle <bao...@gm...>: >>>>> 2008/5/23 Chris Morley <c.m...@ds...>: >>>>>> I think Nick's suggestion is a good idea. >>>>>> There could be a C++ function >>>>>> >>>>>> bool OBConversion::OpenInAndOutFiles(std::string infilename, std::string >>>>>> outfilename); >>> >>> I've now added this to the development code. Here it is making an index >>> and then substructure searching it. (Note that my Python is rather shaky.) >>> I don't think the CloseOutFile() is necessary here but may be sometimes. >>> It is a bit tedious compared with the babel command line, but maybe >>> Pybel could help? >>> >>> >>> import openbabel >>> >>> conv=openbabel.OBConversion() >>> >>> conv.OpenInAndOutFiles("1200mols.smi","index.fs") >>> True >>> >>> conv.SetInAndOutFormats("smi","fs") >>> True >>> >>> conv.Convert() >>> This will prepare an index of 1200mols.smi and may take some time... >>> It took 6 seconds >>> 1192 >>> >>> conv.CloseOutFile() >>> >>> conv.OpenInAndOutFiles("index.fs","results.smi") >>> True >>> >>> conv.SetInAndOutFormats("fs","smi") >>> True >>> >>> conv.AddOption("s",conv.GENOPTIONS,"C=CC#N") >>> >>> conv.Convert() >>> 10 candidates from fingerprint search phase >>> 1202 >>> >>> f=open("results.smi") >>> >>> f.read() >>> 'OC(=O)C(=Cc1ccccc1)C#N\t298\nN#CC(=Cc1ccccc1)C#N\t490\nO=N(=O)c1cc(ccc1)C=C(C#N >>> )C#N\t491\nClc1ccc(cc1)C=C(C#N)C#N\t492\nClc1ccc(c(c1)Cl)C=C(C#N)C#N\t493\nClc1c >>> cc(cc1Cl)C=C(C#N)C#N\t494\nBrc1ccc(cc1)C=C(C#N)C#N\t532\nClc1ccccc1C=C(C#N)C#N\t >>> 542\nN#CC(=CC=Cc1occc1)C#N\t548\nCCOC(=O)C(C#N)=C(C)C\t1074\n' >>> >>> >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> OpenBabel-scripting mailing list >>> Ope...@li... >>> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >>> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> OpenBabel-scripting mailing list >> Ope...@li... >> https://lists.sourceforge.net/lists/listinfo/openbabel-scripting >> > |