Hi,


On 29 October 2010 13:05, Jules Kerssemakers <jules.kerssemakers@googlemail.com> wrote:
As was mentioned, a lot of the algorithms use flags, so what Nina
brings up deserves checking.
I did a quick check on how flags are currently implemented, and there
is an array of boolean[CDKConstants.MAX_FLAG_INDEX+1] in ChemObject.
Looking up bits by direct indexing is pretty efficient compared to
.contains() (constant time vs. (I believe) linear), so the point she
brings up is valid.

I think this issue could largely be fixed by using a HashSet.

Yes, HashSet is a good idea.
 
The List egon suggested allows objects to be contained (= visited)
twice, which is unnecessary. Currently with flags you also can also
only label an object once, so Sets (allowing an object only once) are
sufficient.
Furthermore, the HashSet has a (nearly) linear .contains() method, as
only the hashcode needs to be calculated. Only in the unfortunate case
the hashcode is non-unique in the set (unlikely usually, and this can
be tuned) would a list-search between all the objects sharing the
hashcode be neccesary. (This will usually only be two or three objects
unless you have massive hashcode collisions.)

A prime requirement for this implementation to work effeciently is
that the ChemObject's hashcode is a well-distributed one. (I.E.
unlikely to generate the same hashcode for different objects.)

MD5 can be used, it's known to be a good compromise between simplicity and well-distributed-ness.

http://download.oracle.com/javase/1.4.2/docs/api/java/security/MessageDigest.html

MessageDigest digest = java.security.MessageDigest.getInstance("MD5");


I also don't really see why we would need a whole VisitedClass. This
is more of a cut-and-paste design pattern:

Set<IAtom> myVisitedAtoms = new HashSet<IAtom>();
for (atom in someAtomContainerSet) {
 if ( ! myVisitedAtoms.contains(atom) ) {
   // do stuff that only needs to be done once
   myVisitedAtoms.add(atom);
 }
}

And, while we're at it, why not migrate some more flags to this
implementation? The argument that IS_VISITED makes the things
thread-unsafe applies to ISPLACED and MAPPED as well.
The flags IS(NOT)INRING, ISALIPHATIC, ISAROMATIC,
IS_HYDROGENBOND_DONOR/ACCEPTOR, REACTIVE_CENTER and IS_TYPEABLE are
logically atom-properties, so they should probably stay as is, with
the Atom.

Now that I mention it, why are these flags defined at the ChemObject
level? I know they make sense for Atom extends ChemObject, but now
they are also defined for ChemFile, ChemModel, Ring, Strand, Polymer
and all other things that extend ChemObject..


Another thing that might be worth looking at could be the "decorator
pattern". In this pattern each object has a HashSet<Object> where
everything can put its own flags/annotations/whatever in. If the
visitedList Egon suggested is the algorithm-centric version of the
visited-list, the decorator-pattern is the object-centric version of
it.

 
I'm not sure if this can be done in a threadsafe manner, but probably
it can. An additional downside is that if some algorithm doesn't clean
up properly, the objects become larger.


My (slight) preference is for the object-centric version, but I guess it could be decided case by case. Sets could be thread safe, if necessary.

Best regards,
Nina
 
Cheers,
Jules

On 29 October 2010 07:33, Nina Jeliazkova <jeliazkova.nina@gmail.com> wrote:
>
>
> On 29 October 2010 00:57, Egon Willighagen <egon.willighagen@gmail.com>
> wrote:
>>
>> On Thu, Oct 28, 2010 at 11:54 PM, Rajarshi Guha <rajarshi.guha@gmail.com>
>> wrote:
>> > Couldn't algorithms use their own data structures to check for this,
>> > rather than the API expanding to rpovide a class for this?
>>
>> Sure. I thought it might be useful to share some code, but the
>> abstract class turns out to be  pretty small, so maybe not needed...
>>
>
> IMHO a shared abstract class is nice, otherwise one needs to guess every
> time what structure is used for the same purpose ... harder to read the
> code.  My only concern about the change is regarding performance, flags are
> usually much faster than allocating  objects in lists ...
>
> Nina
>
>>
>> Egon
>>
>>
>> --
>> Dr E.L. Willighagen
>> Postdoctoral Research Associate
>> University of Cambridge
>> Homepage: http://egonw.github.com/
>> LinkedIn: http://se.linkedin.com/in/egonw
>> Blog: http://chem-bla-ics.blogspot.com/
>> PubList: http://www.citeulike.org/user/egonw/tag/papers
>>
>>
>> ------------------------------------------------------------------------------
>> Nokia and AT&T present the 2010 Calling All Innovators-North America
>> contest
>> Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in
>> marketing
>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
>> http://p.sf.net/sfu/nokia-dev2dev
>> _______________________________________________
>> Cdk-devel mailing list
>> Cdk-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/cdk-devel
>
>
> ------------------------------------------------------------------------------
> Nokia and AT&T present the 2010 Calling All Innovators-North America contest
> Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
> http://p.sf.net/sfu/nokia-dev2dev
> _______________________________________________
> Cdk-devel mailing list
> Cdk-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cdk-devel
>
>

------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
http://p.sf.net/sfu/nokia-dev2dev
_______________________________________________
Cdk-devel mailing list
Cdk-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-devel