From: John M. <jo...@eb...> - 2012-12-09 21:31:30
|
Hi All, I noticed that what I term auxiliary atom type properties, such as PI_BONDCOUNT, LONEPAIR_COUNT, SINGLE_COUNT… etc. are never configured. Some algorithms try to use this information (e.g. this line in CDKHueckelAromaticityDetector) but there is no obvious way to get at this information. It would be natural to add this to AtomTypeManipulator.configure() but as these are stored in a map it mean creating a HashMap on every single atom :/. It would be good to have these properties accessible but I can't see a useful way of doing it. Any thoughts are most welcome. Many thanks, J John May | Predoctoral Student – Chemoinformatics and Metabolism European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK jo...@eb... | +44–(0) 1223 49 2603 |
From: Nina J. <jel...@gm...> - 2012-12-10 06:34:40
|
On 9 December 2012 23:31, John May <jo...@eb...> wrote: > Hi All, > > I noticed that what I term *auxiliary atom type properties, *such > as PI_BONDCOUNT, LONEPAIR_COUNT, SINGLE_COUNT… etc. are never configured. > Some algorithms try to use this information (e.g. this line in > CDKHueckelAromaticityDetector<https://github.com/egonw/cdk/blob/master/src/main/org/openscience/cdk/aromaticity/CDKHueckelAromaticityDetector.java#L184>) > but there is no obvious way to get at this information. It would be natural > to add this to AtomTypeManipulator.configure() but as these are stored in a > map it mean creating a HashMap on every single atom :/. > > It would be good to have these properties accessible but I can't see a > useful way of doing it > Perhaps a different flavour of AtomTypeManipulator.configure(), to set only these properties and to be used only when required? Regards, Nina > > Any thoughts are most welcome. > > Many thanks, > J > [image: me] *John May* | Predoctoral Student – Chemoinformatics and > Metabolism > European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, > Cambridge, CB10 1SD, UK > jo...@eb... | +44–(0) 1223 49 2603 > > > > ------------------------------------------------------------------------------ > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > Remotely access PCs and mobile devices and provide instant support > Improve your efficiency, and focus on delivering more value-add services > Discover what IT Professionals Know. Rescue delivers > http://p.sf.net/sfu/logmein_12329d2d > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel > > |
From: John M. <joh...@gm...> - 2012-12-10 08:35:35
|
Hi Nina, I was thinking that too but I was apprehensive as there is already a configure and configureUnsetProperties. It might be more confusing to add another option? Perhaps that is the best way though. Thanks, J On 10 Dec 2012, at 06:34, Nina Jeliazkova <jel...@gm...> wrote: > > > On 9 December 2012 23:31, John May <jo...@eb...> wrote: > Hi All, > > I noticed that what I term auxiliary atom type properties, such as PI_BONDCOUNT, LONEPAIR_COUNT, SINGLE_COUNT… etc. are never configured. Some algorithms try to use this information (e.g. this line in CDKHueckelAromaticityDetector) but there is no obvious way to get at this information. It would be natural to add this to AtomTypeManipulator.configure() but as these are stored in a map it mean creating a HashMap on every single atom :/. > > It would be good to have these properties accessible but I can't see a useful way of doing it > > Perhaps a different flavour of AtomTypeManipulator.configure(), to set only these properties and to be used only when required? > > Regards, > Nina > > Any thoughts are most welcome. > > Many thanks, > J > John May | Predoctoral Student – Chemoinformatics and Metabolism > European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK > jo...@eb... | +44–(0) 1223 49 2603 > > > ------------------------------------------------------------------------------ > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > Remotely access PCs and mobile devices and provide instant support > Improve your efficiency, and focus on delivering more value-add services > Discover what IT Professionals Know. Rescue delivers > http://p.sf.net/sfu/logmein_12329d2d > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel > > > ------------------------------------------------------------------------------ > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > Remotely access PCs and mobile devices and provide instant support > Improve your efficiency, and focus on delivering more value-add services > Discover what IT Professionals Know. Rescue delivers > http://p.sf.net/sfu/logmein_12329d2d_______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel |
From: Egon W. <ego...@gm...> - 2012-12-11 09:57:39
|
On Sun, Dec 9, 2012 at 10:31 PM, John May <jo...@eb...> wrote: > I noticed that what I term auxiliary atom type properties, such as PI_BONDCOUNT, LONEPAIR_COUNT, SINGLE_COUNT… etc. are never configured. Some algorithms try to use this information (e.g. this line in CDKHueckelAromaticityDetector) but there is no obvious way to get at this information. It would be natural to add this to AtomTypeManipulator.configure() but as these are stored in a map it mean creating a HashMap on every single atom :/. I think that method should configure all properties an atom type definition file may contain, and thus that it should configure the above properties. I consider this a bug. > It would be good to have these properties accessible but I can't see a useful way of doing it. This information is used at some points, and I am not sure how it is read there, if not via the configure() method... that I need to explore. Thanx for spotting and reporting this! Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Nina J. <jel...@gm...> - 2012-12-11 12:49:01
|
On 11 December 2012 07:54, Egon Willighagen <ego...@gm...> wrote: > On Sun, Dec 9, 2012 at 10:31 PM, John May <jo...@eb...> wrote: >> I noticed that what I term auxiliary atom type properties, such as PI_BONDCOUNT, LONEPAIR_COUNT, SINGLE_COUNT… etc. are never configured. Some algorithms try to use this information (e.g. this line in CDKHueckelAromaticityDetector) but there is no obvious way to get at this information. It would be natural to add this to AtomTypeManipulator.configure() but as these are stored in a map it mean creating a HashMap on every single atom :/. > > I think that method should configure all properties an atom type > definition file may contain, and thus that it should configure the > above properties. I consider this a bug. Right, but in principle a HashMap per atom is a bit of overkill, if this information is only used in few cases. Regards, Nina > >> It would be good to have these properties accessible but I can't see a useful way of doing it. > > This information is used at some points, and I am not sure how it is > read there, if not via the configure() method... that I need to > explore. > > Thanx for spotting and reporting this! > > Egon > > -- > Dr E.L. Willighagen > Postdoctoral Researcher > Department of Bioinformatics - BiGCaT > Maastricht University (http://www.bigcat.unimaas.nl/) > Homepage: http://egonw.github.com/ > LinkedIn: http://se.linkedin.com/in/egonw > Blog: http://chem-bla-ics.blogspot.com/ > PubList: http://www.citeulike.org/user/egonw/tag/papers > > ------------------------------------------------------------------------------ > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > Remotely access PCs and mobile devices and provide instant support > Improve your efficiency, and focus on delivering more value-add services > Discover what IT Professionals Know. Rescue delivers > http://p.sf.net/sfu/logmein_12329d2d > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel |
From: Egon W. <ego...@gm...> - 2012-12-11 13:32:37
|
On Tue, Dec 11, 2012 at 1:48 PM, Nina Jeliazkova <jel...@gm...> wrote: > On 11 December 2012 07:54, Egon Willighagen <ego...@gm...> wrote: >> On Sun, Dec 9, 2012 at 10:31 PM, John May <jo...@eb...> wrote: >> I think that method should configure all properties an atom type >> definition file may contain, and thus that it should configure the >> above properties. I consider this a bug. > > Right, but in principle a HashMap per atom is a bit of overkill, if > this information is only used in few cases. That is a good point. The same actually applies to any of the 'field based' atom type properties. I guess this is why I have been thinking about making the IAtom not extend IAtomType but encapsulate it, so that we can have, perhaps, an enum of CDK atom types (and an enum for $foo atom types), etc... That way, we'd have that information only once per atom type too... more ideas welcome! Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: John M. <joh...@gm...> - 2012-12-11 13:50:18
|
Yep, composition is much better then inheritance :-) Would be good to have enumeration but not sure if it would be worth it for all atom types. Also it isn't currently possible to create an enumeration on any CDK ChemObject as the interfaces overrides clone() to be public which enum does not allow. J On 11 Dec 2012, at 13:32, Egon Willighagen <ego...@gm...> wrote: > I have been thinking > about making the IAtom not extend IAtomType but encapsulate i |
From: John M. <joh...@gm...> - 2012-12-11 13:43:09
|
Yeah, it's also a LinkedHashMap (more memory due doubly linked list). I think all maps initialise to the default of ~14 slots in which case we can do some rough calculations, taking a conservative estimate of v=20 atoms… loading n=25,000 molecules. Each pointer (inc. null) is 8 bytes (64-bit system) so as a lower bound we get. pointer x hash map length x null bytes x v x n 8 * 14 * 8 * 25,000 * 20 = 448000000 = 427 mb J On 11 Dec 2012, at 12:48, Nina Jeliazkova <jel...@gm...> wrote: > On 11 December 2012 07:54, Egon Willighagen <ego...@gm...> wrote: >> On Sun, Dec 9, 2012 at 10:31 PM, John May <jo...@eb...> wrote: >>> I noticed that what I term auxiliary atom type properties, such as PI_BONDCOUNT, LONEPAIR_COUNT, SINGLE_COUNT… etc. are never configured. Some algorithms try to use this information (e.g. this line in CDKHueckelAromaticityDetector) but there is no obvious way to get at this information. It would be natural to add this to AtomTypeManipulator.configure() but as these are stored in a map it mean creating a HashMap on every single atom :/. >> >> I think that method should configure all properties an atom type >> definition file may contain, and thus that it should configure the >> above properties. I consider this a bug. > > Right, but in principle a HashMap per atom is a bit of overkill, if > this information is only used in few cases. > > Regards, > Nina > >> >>> It would be good to have these properties accessible but I can't see a useful way of doing it. >> >> This information is used at some points, and I am not sure how it is >> read there, if not via the configure() method... that I need to >> explore. >> >> Thanx for spotting and reporting this! >> >> Egon >> >> -- >> Dr E.L. Willighagen >> Postdoctoral Researcher >> Department of Bioinformatics - BiGCaT >> Maastricht University (http://www.bigcat.unimaas.nl/) >> Homepage: http://egonw.github.com/ >> LinkedIn: http://se.linkedin.com/in/egonw >> Blog: http://chem-bla-ics.blogspot.com/ >> PubList: http://www.citeulike.org/user/egonw/tag/papers >> >> ------------------------------------------------------------------------------ >> LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial >> Remotely access PCs and mobile devices and provide instant support >> Improve your efficiency, and focus on delivering more value-add services >> Discover what IT Professionals Know. Rescue delivers >> http://p.sf.net/sfu/logmein_12329d2d >> _______________________________________________ >> Cdk-devel mailing list >> Cdk...@li... >> https://lists.sourceforge.net/lists/listinfo/cdk-devel > > ------------------------------------------------------------------------------ > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > Remotely access PCs and mobile devices and provide instant support > Improve your efficiency, and focus on delivering more value-add services > Discover what IT Professionals Know. Rescue delivers > http://p.sf.net/sfu/logmein_12329d2d > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel |
From: Egon W. <ego...@gm...> - 2012-12-11 13:48:59
|
On Tue, Dec 11, 2012 at 2:42 PM, John May <joh...@gm...> wrote: > Yeah, it's also a LinkedHashMap (more memory due doubly linked list). I > think all maps initialise to the default of ~14 slots in which case we can > do some rough calculations, taking a conservative estimate of v=20 atoms… > loading n=25,000 molecules. Each pointer (inc. null) is 8 bytes (64-bit > system) so as a lower bound we get. > > pointer x hash map length x null bytes x v x n > 8 * 14 * 8 * 25,000 * 20 = 448000000 = 427 mb What if we had those four properties as additional fields in IAtomType... that would match: 8 * 3 * 8 * 25,000 * 20 ~ 90 MB Still a lot, not? Does this imply that we should move to wrapped IIsotope and IAtomType anyway? Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: John M. <joh...@gm...> - 2012-12-11 13:52:10
|
Actually it would be around 3 * 8 * 25,000 * 20 which is ~ 11 Mb. On 11 Dec 2012, at 13:48, Egon Willighagen <ego...@gm...> wrote: > On Tue, Dec 11, 2012 at 2:42 PM, John May <joh...@gm...> wrote: >> Yeah, it's also a LinkedHashMap (more memory due doubly linked list). I >> think all maps initialise to the default of ~14 slots in which case we can >> do some rough calculations, taking a conservative estimate of v=20 atoms… >> loading n=25,000 molecules. Each pointer (inc. null) is 8 bytes (64-bit >> system) so as a lower bound we get. >> >> pointer x hash map length x null bytes x v x n >> 8 * 14 * 8 * 25,000 * 20 = 448000000 = 427 mb > > What if we had those four properties as additional fields in > IAtomType... that would match: > > 8 * 3 * 8 * 25,000 * 20 ~ 90 MB > > Still a lot, not? > > Does this imply that we should move to wrapped IIsotope and IAtomType anyway? > > Egon > > > -- > Dr E.L. Willighagen > Postdoctoral Researcher > Department of Bioinformatics - BiGCaT > Maastricht University (http://www.bigcat.unimaas.nl/) > Homepage: http://egonw.github.com/ > LinkedIn: http://se.linkedin.com/in/egonw > Blog: http://chem-bla-ics.blogspot.com/ > PubList: http://www.citeulike.org/user/egonw/tag/papers > > ------------------------------------------------------------------------------ > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > Remotely access PCs and mobile devices and provide instant support > Improve your efficiency, and focus on delivering more value-add services > Discover what IT Professionals Know. Rescue delivers > http://p.sf.net/sfu/logmein_12329d2d > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel |
From: Egon W. <ego...@gm...> - 2012-12-11 13:55:05
|
On Tue, Dec 11, 2012 at 2:51 PM, John May <joh...@gm...> wrote: > Actually it would be around 3 * 8 * 25,000 * 20 which is ~ 11 Mb. That's a factor 40... mmm, maybe I (or anyone) should make an IAtomType patch then for the newer AT properties... ? Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |