From: Ola S. <ola...@fa...> - 2005-05-03 17:15:12
|
Ctd on CDK-devel: On Tue, 2005-05-03 at 12:32, Egon Willighagen wrote: > Peter, Gemma, and Martin, > > I've copied you in with questions related to this cdk-user thread. Please > read, and look for your name. Thanx! > > On Tuesday 03 May 2005 11:59 am, Ola Spjuth wrote: > > I have not looked into the produced CML yet but intend to. > > Yes, please do, and post a CML atom output here... OK, here is one example: Line in PDB: --------------- ATOM 1 O5* A A 1 12.718 25.492 40.619 1.00 20.74 1ZAA 61 In CML: --------- <atom elementType="O" id="O5*" x3="12.718" y3="25.492" z3="40.619"> <scalar title="pdb.record" xmlns="http://www.xml-cml.org/schema/cml2/core">ATOM 1 O5* A A 1 12.718 25.492 40.619 1.00 20.74 1ZAA 61</scalar> <scalar title="pdb.chainID" xmlns="http://www.xml-cml.org/schema/cml2/core">A</scalar> <scalar title="pdb.serial" xmlns="http://www.xml-cml.org/schema/cml2/core">1</scalar> <scalar title="hetatm" xmlns="http://www.xml-cml.org/schema/cml2/core">0</scalar> <scalar title="pdb.resName" xmlns="http://www.xml-cml.org/schema/cml2/core">A</scalar> <scalar title="pdb.resSeq" xmlns="http://www.xml-cml.org/schema/cml2/core">1</scalar> <scalar title="pdb.altLoc" xmlns="http://www.xml-cml.org/schema/cml2/core"></scalar> <scalar title="org.jmol.adapter.cdk.ATOM_SET_INDEX" xmlns="http://www.xml-cml.org/schema/cml2/core">0</scalar> <scalar title="InvariancePair" xmlns="http://www.xml-cml.org/schema/cml2/core">1 </scalar> <scalar title="pdb.iCode" xmlns="http://www.xml-cml.org/schema/cml2/core"></scalar> <scalar title="pdb.tempFactor" xmlns="http://www.xml-cml.org/schema/cml2/core">20.74</scalar> <scalar title="CanonicalLable" xmlns="http://www.xml-cml.org/schema/cml2/core">1</scalar> <scalar title="pdb.charge" xmlns="http://www.xml-cml.org/schema/cml2/core">61</scalar> <scalar title="pdb.element" xmlns="http://www.xml-cml.org/schema/cml2/core"></scalar> <scalar title="pdb.occupancy" xmlns="http://www.xml-cml.org/schema/cml2/core">1.0</scalar> <scalar title="pdb.segID" xmlns="http://www.xml-cml.org/schema/cml2/core">1ZAA</scalar> <scalar title="oxt" xmlns="http://www.xml-cml.org/schema/cml2/core">0</scalar> <scalar title="pdb.name" xmlns="http://www.xml-cml.org/schema/cml2/core">O5*</scalar> </atom> Some things seem wrong to me, like the pdb.charge=61 (where 61 is the line number in the pdb-file?). Some important parts of the PDB-header are missing, like Helix and Sheet information. I guess this must be taken into account if we are to display the protein in Jmol. The current solution makes files VERY large. My 112K PDB-file is 2.3M in CML. Hopefully we can come up with a more condensed solution. Note: This CML might have been constructed with the CMLWriter in Jmol.jar (according to the line with "org.jmol.adapter.cdk.ATOM_SET_INDEX") since it is conflicting with my cdk-libio-cml.jar. Convertor is from the latter though, compiled with JDK1.5. > > > > > To sum up: I/O of molecules and biopolymers in CML is central to our > > > > chemo/bio-informatic application so we really hope this will be given a > > > > priority. We also hope that CML will be extended to comprise > > > > Biopolymers (preferably with strands according to Martin Eklunds > > > > submitted patch) > > > > > > That patch has been added to CVS. > > > > That was indeed good news. How do I check out and apply the patch? > > You'll have to check out CDK from CVS. Already done. Didn't realize that it was there. Will look at it. > > > We would be happy to test it. > > That would be very nice. > > > And is the CdkJmolAdapter also updated so we > > can send Biopolymers to Jmol? > > Haven't tested yet. Hmmm, this doesn't seem to work. Something for Martin to look into. > > Martin? > > > > > in > > > > the near future, but this is obviously out-of-list here :) This > > > > together would make CDK and CML very powerful for bioinformatics > > > > applications as well! > > > > > > Ok, good to hear that you are making progress... I haven't had time yet > > > to convert a PDB to CML, and see how the PDB fields are put into CML... > > > It would be nice to define how a PDB atom would look in CML, i.e. will > > > all PDB fields... when we have done this, the Convertor can be > > > modified... > > > > Maybe we can assist you with this. Or does it require deep knowledge > > about CML? That we have none... > > No, it does not require much intelligence what so ever... > > In PDB an atom is stored as something like: > > ATOM 1 N THR 1 17.047 14.099 3.625 1.00 13.79 > > And all this info needs to be transfered into CML, so we need something like > > <atom elementType='N' x3='17.047' y3='14.099' z3='3.625'/> > <scalar dataType="xsd:string" dictRef="pdb:residueType">THR</scalar> > </atom> > > Here the <scaler> element defines extra information that cannot be stored > in other more core CML things. The element defines the content type, a string, > and a reference to a dictionary (pdb) item (residueType)... > > And defining this dictionary is the main thing we need to do... > > Gemma, do you have a dictionary for PDB information, like found in PDB files? > > > >From what I remember, all PDB fields required to import CML as a > > > > Biopolymer are put in CML but are not used by the CMLReader or > > Convertor. Martin Eklund created a class to convert a Molecule to a > > Biopolymer, but then you need to know that it is a biopolymer > > beforehand. Also, Strands are not supported in this since CML doesnt > > support it (yet). > > I propose we go forward in the way PDB stores this information too: > have a field associated with the atom's to define to which strand they > belong... > > Alternatively, but that involves more thinking, is this setup in CML: > > <molecule dictRef="pdb:compound" id="1CRN"> > <molecule dictRef="pdb:strand" id="strand1"> > </molecule> > <molecule dictRef="pdb:strand" id="strand2"> > </molecule> > <!-- etc --> > </molecule> > > Peter, are there other initiatives for PDB into CML-X conversion? > > Egon Cheers .../Ola |