From: Pantelis S. <ch...@ma...> - 2012-05-13 14:44:09
|
On Sun, 2012-05-13 at 15:51 +0200, Egon Willighagen wrote: > On Sun, May 13, 2012 at 2:25 PM, Pantelis Sopasakis <ch...@ma...> wrote: > > I'm planning to use CDK for conversion between different > > representations of molecules. In particular I need to get the InChiKey > > from an SD file. Could you please give me a hint? > > I was doing just that last week, though via Bioclipse's Groovy environment: > > mols = cdk.createMoleculeIterator("/Open PHACTS/mcard_sdf_all.sdf") > > // the MDL molfile title lines look like: > // "Beginning of SDF File of HMDB00001 1-Methylhistidine" > > counter = 0; > new File("/home/egonw/bioclipse/Open PHACTS/hmdbinchis.txt").withWriter { out -> > while (mols.hasNext()) { > mol = mols.next() > title = mol.getAtomContainer().getProperty("cdk:Title") > matcher = title =~ /HMDB\d+/ > if (matcher[0]) { > hmdb = matcher[0] > key = inchi.generate(mol).key > if (counter == 100) { > println hmdb + " " + key > counter = 0 > } > out.println hmdb + " " + key > counter++ > } > } > } > > If you like to use CDK from the command, e.g. with Groovy, you should > be combing the following two scripts: > > iterator = new IteratingMDLReader( > new File("data/test6.sdf").newReader(), > DefaultChemObjectBuilder.getInstance() > ) > while (iterator.hasNext()) { > IMolecule mol = iterator.next() > formula = MolecularFormulaManipulator.getMolecularFormula(mol) > println MolecularFormulaManipulator.getString(formula) > } > > and > > methane = new Molecule(); > atom1 = new Atom("C") > methane.addAtom(atom1) > > factory = InChIGeneratorFactory.getInstance(); > generator = factory.getInChIGenerator(methane); > if (generator.getReturnStatus() == INCHI_RET.OKAY) > print generator.getInchiKey() > > The first is for processing the SD file, and the second for the > generation of InChIKeys. > > Egon > Hi Egon, I noticed that you use "getProperty("cdk:Title")" to get the title of the molecule as it is declared in the SD file. I think it would be convenient and more intuitive to have a method like getDeclaredTitle()::String. (Btw, I'm using CDK from Java). I have one more question regarding automatic naming of compounds. Is there a way to get the IUPAC name for a molecule given its structure? For the time being, I have to use Marvin sketch to generate the IUPAC name, store it inside the SD file and parse it from Java. There's an alternative that I've also tried using the newly released REST interface of PubChem - the problem is that it returns the IUPAC Name only for registered molecules. Best regards, Pantelis Sopasakis, ch...@ma... Dipl. Chemical Engineer NTUA Msc. Applied Mathematics NTUA Automatic Control Unit, School of Chem. Eng., NTUA Heroon Polytechneiou 9, 15780 Zografou Campus, Athens, Greece Tel. (+30) 210 772 3236 (office) Research fellow at Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH) Ingolstädter Landstr. 1 85764 Neuherberg pan...@he... |