|
From: Steffen N. <sne...@ip...> - 2016-12-05 12:37:55
|
Hi John, thanks for the quick answer, and yes, I also came across efficientbits.blogspot minutes ago :-) My input is sort-of a molfile, embedded into a Bruker MS/MS library file. And yes, I have [...] A 57 CO2H [...] M END So I am inclined to tell the authors to do the COOH explicitly, since this complicates everything downstream, unless you had a fix to expand CO2H to O=C-O-H in CDK itself. And still ... Yours, Steffen On Mo, 2016-12-05 at 11:28 +0000, John M wrote: > Hi Steffen, > > An interesting question that I've actually been thinking about > recently. Is the input from a Molfile? In short the answer is no it's > not supported yet and it's better keep formats sane but it's > potentially a useful feature to support. Explanation: > > We support the semantically correct way of doing this which is with > SGroups. Here the abbreviation is merely a display shortcut and the > full atom representation is present, here's one from ChemDraw: > > > Untitled Document-1 > > ChemDraw12051610002D > > 5 4 0 0 0 0 0 0 0 0999 V2000 > -0.3572 -0.6188 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > -1.0717 -0.2062 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 0.3572 -0.2062 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 0.3572 0.6188 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 > 1.0717 -0.6188 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 > 3 4 2 0 > 3 5 1 0 > 1 3 1 0 > 1 2 2 0 > M STY 1 1 SUP > M SLB 1 1 1 > M SAL 1 3 3 4 5 > M SBL 1 1 3 > M SMT 1 CO2H > M SBV 1 3 -0.7145 -0.4125 > M END > More info here: http://efficientbits.blogspot.co.uk/2015/11/bringing- > molfile-sgroups-to-cdk.html > > I'm guessing you have something like this: > > > Untitled Document-1 > > ChemDraw12051610082D > > 3 2 0 0 0 0 0 0 0 0999 V2000 > 0.0000 -0.2062 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 0.7145 0.2062 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > -0.7145 0.2062 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 1 2 1 0 > 1 3 2 0 > A 2 > CO2H > M END > It's a relatively common for tools to presume these are simply > labelled super atoms. A little flexibility causes a lot of problem, > It's gets a lot more fun with di and trivalent labels, e.g. PEG > -CH2CH2O- or a peptide -Cys<. I've seen a few presentations recently > where a SMILES like notation is used for peptides, although it kind > of works it gets messy very quickly with gamma linked or modified > peptides etc. > > The canonical implementation of Molfile support is BIOVIA (from > Accelrys (from Symyx (from MDL)). In their old tools (I can go back > as far as Symyx draw 2.5) they actually read it in and automatically > expand some simple abbreviations. I've learnt from Andrew Dalke this > is what is known as 'Librarian Mode'. Curiously CO2H isn't supported > but it's clearly unambiguous. > > John > > On 5 December 2016 at 08:54, Steffen Neumann <sne...@ip...> > wrote: > > Hi, > > > > I have ported some code from CDK-1.3.5 to 1.5.14, and ran into the > > NPE > > mentioned in https://sourceforge.net/p/cdk/mailman/message/32890027 > > / > > > > So I added Isotopes.getInstance().configureAtoms(container); > > and find for some molecules the PseudoAtom CO2H, > > which seems to be a carboxyl group COOH. > > > > Question: Should I coax the molfile/structure provider to change > > this into COOH, or can I teach CDK that this is actually > > nothing special, and (somehow) write a PseudoAtom() > > for this ? In the latter case, how would I do this ? > > Pointers to the documentation with examples welcome :-) > > > > Thanks in advance, > > Yours, > > Steffen > > > > java.lang.IllegalArgumentException: Cannot configure > > an unrecognized element: PseudoAtom(2107853606, CO2H, > > Atom(2107853606, > > S:CO2H, H:0, SP:0, 2D:[(21.7612, -17.5386)], AtomType(2107853606, > > FC:0, > > EV:1, Isotope(2107853606, Element(2107853606, S:CO2H, AN:0))))) > > > > > > > > -- > > IPB Halle AG Massenspektrometrie & Bioinformatik > > Dr. Steffen Neumann http://www.IPB-Halle.DE > > Weinberg 3 Tel. +49 (0) 345 5582 - 1470 > > 06120 Halle +49 (0) 345 5582 - 0 > > sneumann(at)IPB-Halle.DE Fax. +49 (0) 345 5582 - 1409 > > > > > > > > ----------------------------------------------------------------- > > ------------- > > _______________________________________________ > > Cdk-devel mailing list > > Cdk...@li... > > https://lists.sourceforge.net/lists/listinfo/cdk-devel > > > ------------------------------------------------------------------- > ----------- > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel -- IPB Halle AG Massenspektrometrie & Bioinformatik Dr. Steffen Neumann http://www.IPB-Halle.DE Weinberg 3 Tel. +49 (0) 345 5582 - 1470 06120 Halle +49 (0) 345 5582 - 0 sneumann(at)IPB-Halle.DE Fax. +49 (0) 345 5582 - 1409 |