Re: [Cdk-user] Substructure Searching, Fingerprints and cdk-1.3.7 Isomorphism Class

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi all,

Fortunately, the CDK code that reads MOL files adds 
atoms and bonds in the same order, as in the MOL file, otherwise, it 
would be trickier.

Yeah I looked at the MDLV2000Reader Source code and if it does not change that should be fairly easy to achieve.

Of course my next thought was why not store all atoms and bonds and the 
relevent properties? So that you can just create the atomcontainer by 
setBonds and setAtoms.

Because that would take up a lot of space? Hard to tell I'm not
 so familiar (yet?) with CDKsource code and what properties atoms and 
bonds have that are actually relevant for fingerprinting and subgraph 
matching.  Also it's kind of hard to actually see what properties/flags 
are available (set and get Flags, CDKConstants). 

But anyway what I'm trying to suggest or ask or what poped into my mind 
is why not use hibernate (or something similar; an idea which is of course contradicting to my previous comment about storing all aromatic atoms as being stupid)? Ok, I'm not very 
familiar with either (cdk or hibernate, like how do you add an id for hibernate to an existing class?) and cdk object hierarchy in my 
unexperiencied eyes is rather complex and maybe not ideal for hibernate 
this might be a ridicouls idea. of course creating mapping file would be
 a rather tedious and annoying task but you could clearly specifiy which
 information you actually want to store. Ok, there would be a lot more 
rows and columns in the database but each field will contain a lot less 
data compared to having varchar/clob field for molfiles. Maybe it would 
not take that much more storage space than having molfiles and probably 
would perfrom better especially compared to clob-columns. 

end of brainstorming,

Thomas

> From: J.K...@cm...
> Date: Mon, 13 Dec 2010 15:45:19 +0100
> Subject: Re: [Cdk-user] Substructure Searching, Fingerprints and cdk-1.3.7 Isomorphism Class
> To: jel...@gm...
> CC: beg...@ho...; cdk...@li...
> 
> Just a short note to mention that I'm closely following this topic. A
> major rewrite of our own database system is somewhere in the near
> future, so this is good reading! Thanks for sharing!
> 
> ~Jules Kerssemakers
> 
> On 13 December 2010 08:51, Nina Jeliazkova <jel...@gm...> wrote:
> > Hi Thomas,
> >
> > On 10 December 2010 20:04, Thomas Strunz <beg...@ho...> wrote:
> >>
> >> Sorry for calling you stupid. ;)
> >>
> >
> > ;)
> >
> >>
> >>  I just meant if you have like 100'000 Molecules and assuming 25 % are
> >> aromatic  probably mostly benzene rings = 6 molecules + bonds  that leads to
> >> 12* 25'000 = 300'000 records. Ok that's manageable since it's only an ID and
> >> a bit. But depends mostly on the dataset. My focus is on smaller molecules.
> >> Probably also the reason by graph matching does not seem to be that big of a
> >> problem.
> >
> >  just single field with all the additional info for atoms and bonds. Not
> > pretending this is the best way, just a simple one.
> >>
> >> How do you Map a certain Atom or Bond form the Database to the right one
> >> in the AtomContainer created from Molfile?
> >> Does Atom class also have an id like molecule class? Then it would not be
> >> that difficult.
> >>
> >
> > Fortunately, the CDK code that reads MOL files adds atoms and bonds in the
> > same order, as in the MOL file, otherwise, it would be trickier.
> > Regards,
> > Nina
> >>
> >> have a nice weekend
> >>
> >> Regards,
> >>
> >> Thomas
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >