Re: [Cdk-devel] issues with old, rare code

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Aug 21, 2008, Egon Willighagen wrote (responding to Rajarshi Guha):
> Full roundtripping is outside the scope of the CDK... a PDB file is a
> document, really, not a chemical format.

I've been thinking about this distinction, as someone who has written  
several PDB parsers, and other parsers.  I don't understand the  
nuance you're pointing out.  What do you mean by document here?  Is  
CML, with its ability to embed other data, also a document rather  
than a chemical format?  What biases a given file type more towards  
one or the other?  Do the tag fields in an SD file make it somewhat  
of a document?  Are all Gaussian output files documents?

>> I can see the CDK being used in metabolomics and hence the need for
>> PDB support
>
> Ummm... I don't often see metabolites in the PDB format... :)

That's because most people doing metabolomics treat molecules as  
identifier names only and don't track atom level details, much less  
use something like reaction SMILES.

Still, a quick Google search for the words "metabolites in the PDB  
format" ;) found the Human Metabolome Database, and a search of it  
for "Isocitric acid" found this record

   http://hmdb.ca/scripts/show_card.cgi?METABOCARD=HMDB00193.txt

with a link to the isocitric acid as a standalone/synthetic PDB and a  
link to structure 1b0j, which is

   CRYSTAL STRUCTURE OF ACONITASE WITH ISOCITRATE

Out of curiosity, I also checked KEGG.  There are a few PDB-linked  
records, for example:

   http://www.genome.jp/dbget-bin/www_bget?compound+C00167

which links to the "PDB-CCD" record (first time I heard of that term;  
"PDB Chemical Component Dictionary") for UGA at

   http://www.ebi.ac.uk/msd-srv/msdchem/cgi-bin/cgi.pl? 
FUNCTION=getByCode&CODE=UGA

and has a way to get the ligand as a PDB file, as well as a link to  
"In PDB Entries" which lists PDB files containing that ligand, at

   http://www.ebi.ac.uk/msd-srv/msdchem/cgi-bin/cgi.pl? 
FUNCTION=relation&PARENTENTITY=CHEM_COMP&APPLICATION=1&ENTITY=COMP_OCCUR 
ENCES&RELATIONID=3193&PARENTINDEX=0&PARENT0=:UGA%20UGA%20:

> btw, I was not aware that BioJava did 3D structures nowadays...

According to the CVS logs for
   org/biojava/bio/structure/io/PDBFileParser.java

revision 1.3
date: 2005/12/06 15:08:10;  author: andreas;  state: Exp;  lines:  
+661 -647
added a check that ignores empty lines that some people might have at  
the end of their (local) PDB files.
----------------------------
revision 1.2
date: 2005/04/14 12:24:58;  author: andreas;  state: Exp;  lines: +1 -1
made convert_3code_1code public
----------------------------
revision 1.1
date: 2004/10/25 20:37:08;  author: andreas;  state: Exp;
added PDBFileParser as independent class

There's a BioJava-based structure viewer called SPICE that Andreas  
Prlic (the one mentioned in the CVS logs) has been working on.  It's  
pretty widely used at the EBI and the Sanger Institute. See:

   http://www.efamily.org.uk/software/dasclients/spice/

I first saw it at ISMB Detroit, I think, so in 2005.  It's meant more  
for sequence/structure comparisons and feature annotations, which  
aren't things that small molecule viewers typically care about.  Just  
like large molecule structure viewers don't all care about things  
like bond orders. ;)

It does seem sometimes like the big molecule and small molecule  
people are in mostly disjoint fields.

> The Jmol PDB reader might even be a better
> one... lot's of user community around that PDB reader, likely more
> than for the BioJava version...

and there's some integration already between the two, as for example:

   http://www.biojava.org/wiki/BioJava:CookBook:PDB:Jmol

>

				Andrew
				da...@da...