Do you mean dot connected compounds? In that sense, most (if not all) QSAR descriptors should be evaluating descriptors for individual components separately. After that what they do depends on the application - if we're talking about salt forms, probably drop the salt components. Alternatively, if we're talking about mixtures (which is not really the case for a dot connected representation), there could be various ways to generate a mixture descriptor

On Thu, Aug 22, 2013 at 12:14 PM, Martin Guetlein <> wrote:

How do CDK descriptors handle molecules with multiple compounds in it?

I experimented a bit, and found out that it depends on the descriptor:
* most descriptors apparently just add up the values of the single
compounds (like xlogp, that does make no sense does it?)
* some fail for multi-compound molecules
* some compute sth else

My application is building QSAR models. I am not a chemist, but my
feeling is that the clean but complicated solution would be to have
'set-valued features' (a set of values instead of a single value) for
multi-compound molecules. But thats pretty complicated and most of my
molecules have only one compound. But I think that the average value
of the single compounds should be preferred for descriptors like
molecular weight or logp.

Kind regards,

P.S.: Sorry, If I missed existing discussions/documentation on this
issue, I had some problems to denominate (and therefore google) this

Dipl-Inf. Martin GŁtlein
+49 (0)761 203 8442†(office)
+49 (0)177 623 9499†(mobile)

Introducing Performance Central, a new site from SourceForge and
AppDynamics. Performance Central is your source for news, insights,
analysis and resources for efficient Application Performance Management.
Visit us today!
Cdk-user mailing list

Rajarshi Guha |
NIH Center for Advancing Translational Science