Re: [Jmol-developers] Standard syntax for pulling from remote databases

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

This sounds great. Shouldn't be too difficult. We have parsed a number of 
pages for chemical content. And I know that Henry has. So we could put 
together a collaborative list of sites that we can scrape for chemistry. 
The results would all be in CML with appropriate metadata. This would solve 
most of the remaining technical problems - it is then only the legal ones.

P.

At 10:34 02/02/2004 +0100, E.L. Willighagen wrote:
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>On Monday 02 February 2004 10:17, Peter Murray-Rust wrote:
> > At 22:23 01/02/2004 +0100, Egon Willighagen wrote:
> > >dadml://pdb/?1CRN
> > >
> > >or dadml://any/pdbid?1CRN
> > >
> > >The second will try any mirror that can return information based on the
> > >pdbid...
> >
> > Presumably someone enters these mirrors and keeps their addresses and
> > templates up to date.
>
>Yes. The nice thing about the DADML system is, that the maintainance can be
>done by website developers, much like the real domain name server system...
>
> > Is there a cascade - if mirror 1 fails does mirror2
> > get called? And what is returned - the actual file?
> >
> > If so we have something like:
> >
> > User -> PDBCode -> server
> > server -> munged URL (format1)-> mirror1 -> success/error
> > success -> PDB file -> user
> > failure
> > server -> munged URL (format2)-> mirror2 -> success/error
> > and so on
> >
> > is this the model?
>
>Yes, more or less. A HTTP 404 is easily detected, but the system can also
>detect things like a returned webpage which states that no information is
>available...
>
> > >The DADML system also support retrieving information in other formats, not
> > >just chemical/x-pdb or chemical/x-cml, but also text/html etc..
> > >I'm not sure if we want to be able to do that sort of things too, so for
> > > now it only supports reading chemical formats...
> >
> > The attraction of chemical/x-* is that the information contained within
> > each is (relatively?!) consistent and structured. For an arbitrary web site
> > producing HTML the structure could be anything and a separate parser has to
> > be written for each. (For example we have written parsers for 2 of the main
> > sites offering small molecule information and they obviously are completely
> > different.
>
>DADML does not deal with interpretation of the returned format... the
>cdk.internet.dadml.DADMLReader does a bit... it can read molecules from
>chemical/x-mdl-mol and chemical/x-cml and others... actually, it completely
>disregards the MIME system, and just uses the cdk.io.ReaderFactory and looks
>at the contents of the stream...
>
> > Moreover the structure of the pages changes regularly. For
> > example the *text/html* on the RCSB site will be completely different from
> > that on the EBI site even though the actual PDB file is presumably the same
> > or closely related. It is the consistency of chemical/x-* that makes it
> > useful for machines to parse.
>
>Sofar the DADML has only been used to read clear chemical formats, and 
>display
>HTML as is... without any interpretation step... It would be very nice to
>have a web service at WWMM that accepts an URL or DADML URI
>(dadml://nist-html/cas/50-00-0) and converts the HTML into a CML stream...
>
>Something like: dadml://wwmm-nist-bridge/cas/50-00-0
>
>Egon
>
>- --
>eg...@sc...
>PhD on Molecular Representation in Chemometrics
>Nijmegen University
>http://www.cac.sci.kun.nl/people/egonw/
>GPG: 1024D/D6336BA6
>
>-----BEGIN PGP SIGNATURE-----
>Version: GnuPG v1.0.7 (SunOS)
>
>iD8DBQFAHhm2d9R8I9Yza6YRAnBNAJwICKAnGbYiu0lOSQvQuk/FySQxGACgp8aT
>HR1eqfmcCDb6D4uCpzE7GD0=
>=Idqz
>-----END PGP SIGNATURE-----

Peter Murray-Rust
Unilever Centre for Molecular Informatics
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069

Re: [Jmol-developers] Standard syntax for pulling from remote databases

An interactive viewer for three-dimensional chemical structures.

Re: [Jmol-developers] Standard syntax for pulling from remote databases