From: E.L. W. <eg...@sc...> - 2004-02-02 09:34:59
|
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday 02 February 2004 10:17, Peter Murray-Rust wrote: > At 22:23 01/02/2004 +0100, Egon Willighagen wrote: > >dadml://pdb/?1CRN > > > >or dadml://any/pdbid?1CRN > > > >The second will try any mirror that can return information based on the > >pdbid... > > Presumably someone enters these mirrors and keeps their addresses and > templates up to date.=20 Yes. The nice thing about the DADML system is, that the maintainance can be= =20 done by website developers, much like the real domain name server system... > Is there a cascade - if mirror 1 fails does mirror2 > get called? And what is returned - the actual file? > > If so we have something like: > > User -> PDBCode -> server > server -> munged URL (format1)-> mirror1 -> success/error > success -> PDB file -> user > failure > server -> munged URL (format2)-> mirror2 -> success/error > and so on > > is this the model? Yes, more or less. A HTTP 404 is easily detected, but the system can also=20 detect things like a returned webpage which states that no information is=20 available... > >The DADML system also support retrieving information in other formats, n= ot > >just chemical/x-pdb or chemical/x-cml, but also text/html etc.. > >I'm not sure if we want to be able to do that sort of things too, so for > > now it only supports reading chemical formats... > > The attraction of chemical/x-* is that the information contained within > each is (relatively?!) consistent and structured. For an arbitrary web si= te > producing HTML the structure could be anything and a separate parser has = to > be written for each. (For example we have written parsers for 2 of the ma= in > sites offering small molecule information and they obviously are complete= ly > different.=20 DADML does not deal with interpretation of the returned format... the=20 cdk.internet.dadml.DADMLReader does a bit... it can read molecules from=20 chemical/x-mdl-mol and chemical/x-cml and others... actually, it completely= =20 disregards the MIME system, and just uses the cdk.io.ReaderFactory and look= s=20 at the contents of the stream... > Moreover the structure of the pages changes regularly. For > example the *text/html* on the RCSB site will be completely different from > that on the EBI site even though the actual PDB file is presumably the sa= me > or closely related. It is the consistency of chemical/x-* that makes it > useful for machines to parse. Sofar the DADML has only been used to read clear chemical formats, and disp= lay=20 HTML as is... without any interpretation step... It would be very nice to=20 have a web service at WWMM that accepts an URL or DADML URI=20 (dadml://nist-html/cas/50-00-0) and converts the HTML into a CML stream... Something like: dadml://wwmm-nist-bridge/cas/50-00-0 Egon =2D --=20 eg...@sc... PhD on Molecular Representation in Chemometrics Nijmegen University http://www.cac.sci.kun.nl/people/egonw/ GPG: 1024D/D6336BA6 =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (SunOS) iD8DBQFAHhm2d9R8I9Yza6YRAnBNAJwICKAnGbYiu0lOSQvQuk/FySQxGACgp8aT HR1eqfmcCDb6D4uCpzE7GD0=3D =3DIdqz =2D----END PGP SIGNATURE----- |