From: Peter Murray-R. <pm...@ca...> - 2004-02-02 09:19:11
|
At 22:23 01/02/2004 +0100, Egon Willighagen wrote: >On Sunday 01 February 2004 17:28, Egon Willighagen wrote: > > dadml://nist/cas?50-00-0 > > > > This would allow things in Java like: > > > > URI uri = new URI("dadml://nist/cas?50-00-0") > > String protocol = uri.getScheme(); > > String service = uri.getAuthority(); > > String index = uri.getPath(); > > String query = uri.getQuery(); > >Ok, CDK's cdk/internet/DADMLReader now accepts things like: > >dadml://any/CAS-NUMBER?50-00-0 > >It's not really fine tuned to the syntax above, but a nice start. > >Note the any, which indicates that the URI should be resolved to the first >database that could contain information... > >Later this week it will be possible to use services like pdb... so things like > >dadml://pdb/?1CRN > >or dadml://any/pdbid?1CRN > >The second will try any mirror that can return information based on the >pdbid... Presumably someone enters these mirrors and keeps their addresses and templates up to date. Is there a cascade - if mirror 1 fails does mirror2 get called? And what is returned - the actual file? If so we have something like: User -> PDBCode -> server server -> munged URL (format1)-> mirror1 -> success/error success -> PDB file -> user failure server -> munged URL (format2)-> mirror2 -> success/error and so on is this the model? >The DADML system also support retrieving information in other formats, not >just chemical/x-pdb or chemical/x-cml, but also text/html etc.. >I'm not sure if we want to be able to do that sort of things too, so for now >it only supports reading chemical formats... The attraction of chemical/x-* is that the information contained within each is (relatively?!) consistent and structured. For an arbitrary web site producing HTML the structure could be anything and a separate parser has to be written for each. (For example we have written parsers for 2 of the main sites offering small molecule information and they obviously are completely different. Moreover the structure of the pages changes regularly. For example the *text/html* on the RCSB site will be completely different from that on the EBI site even though the actual PDB file is presumably the same or closely related. It is the consistency of chemical/x-* that makes it useful for machines to parse. P. Peter Murray-Rust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +44-1223-763069 |