From: Rajarshi G. <rx...@ps...> - 2005-02-08 18:16:54
|
Hi, I'm attempting to get descriptor classes from the qsar-descriptors.xml file using the DictionaryDatabase functions. After digging a little into DictionaryHandler.java I see that in the function startElement(), any <entry> element is added to the dictionary. The problem that I'm seeing is that this places the BibTeX entries as Entry objects into the returned dictionary. Is there any way to skip BibTeX entries? Secondly, the startElement() function only appears to add the 'id' and 'term' tags to the current entry. In the case of the descriptor dictionaries, the information from the metadataList and metadata tags are not added. Adding these tags to an entry will require that Entry be modified to include fields for meta data - is it OK to add it or will that impact other uses of Entry? Thirdly, in addEntry() of Dictioanry.java the line that addes the entry to the array of entries reads: entries.put(entry.getID().toLowerCase(), entry); In the case of the descriptor xml file, the ID is supposed to match the value of the specification in the descriptor routines themselves (is that corrrect?). So by converting to lower case that would not let me link the entries from the XML file to specification information from the descriptor classes. Is it OK, to get rid of the toLowerCase() or will that impact other uses of the dictionary? Of course, if there are reasons why the Dictionaries and associated classes are designed that way, then ignore the above :) ------------------------------------------------------------------- Rajarshi Guha <rx...@ps...> <http://jijo.cjb.net> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- So the Zen master asked the hot-dog vendor, "Can you make me one with everything?" - TauZero on Slashdot |
From: Rajarshi G. <rx...@ps...> - 2005-02-08 19:57:26
|
On Tue, 2005-02-08 at 13:16 -0500, Rajarshi Guha wrote: > Hi, > I'm attempting to get descriptor classes from the qsar-descriptors.xml > file using the DictionaryDatabase functions. > > After digging a little into DictionaryHandler.java I see that in the > function startElement(), any <entry> element is added to the dictionary. > The problem that I'm seeing is that this places the BibTeX entries as > Entry objects into the returned dictionary. > > Is there any way to skip BibTeX entries? OK - I solved this problem by looking at the full qualified name of the entry. Is it a good idea to skip BibTeX information in a descriptor dictionary? For what I'm doing now, it seems the answer is yes. But in the long run will we want to access the BibTeX information from the dictionaries? I think yes, otherwise why would it be there :) ------------------------------------------------------------------- Rajarshi Guha <rx...@ps...> <http://jijo.cjb.net> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- Finally I am becoming stupider no more - Paul Erdos' epitaph |
From: Egon W. <e.w...@sc...> - 2005-02-09 10:07:44
|
On Tuesday 08 February 2005 08:57 pm, Rajarshi Guha wrote: > For what I'm doing now, it seems the answer is yes. But in the long run > will we want to access the BibTeX information from the dictionaries? I > think yes, otherwise why would it be there :) In the long run, we'll use CMLDOM for this. But rumours go that even dictRef like things are going to be replaced by RDF... So, skipping is fine for now, unless we really need it right now. Egon |
From: Peter Murray-R. <pm...@ca...> - 2005-02-09 19:07:23
|
At 11:06 09/02/2005 +0100, Egon Willighagen wrote: >On Tuesday 08 February 2005 08:57 pm, Rajarshi Guha wrote: > > For what I'm doing now, it seems the answer is yes. But in the long run > > will we want to access the BibTeX information from the dictionaries? I > > think yes, otherwise why would it be there :) >In the long run, we'll use CMLDOM for this. But rumours go that even dictRef >like things are going to be replaced by RDF... I think we'll see that other disciplines (like bioscience) make rapid progress with ontologies and we should narrow where appropriate. But for now continue to use dictRef with the semantics: <blurf xmlns:foo="http://www.zzz.org/dict/foo.xml"> <xyzzy dictRef="foo:bar" >abc</xyzzy> </blurf> The namespace foo must now have a namespace URI/URL (or Java 1.5 will fail to parse). This URL should be an absolute or relative address. "bar" should point to the entry with id="bar". The main problem is the addressing. I am still struggling with this: - if you use relative URLs and relocate the file, the addresses break - if you use absolute ones you rely on a web presence or a duplicate of the local file system XML./SGML addresses this sort of thing with a catalog. It isn't perfect and is non-trivial to implement. So we have to have a local dictionary catalog of some sort. I think we need to use the same $user.home mechanism as for .jmol. .jchempaint, .jumbo where each directory has a config.properties file. In whatever case we have to have: - the user maintains their dictionaries - we provide a dictionary installer - the dictionaries are on the web It is here where I think RDF will help as the browsers become RDF-aware P. >So, skipping is fine for now, unless we really need it right now. > >Egon > > >------------------------------------------------------- >SF email is sponsored by - The IT Product Guide >Read honest & candid reviews on hundreds of IT Products from real users. >Discover which products truly live up to the hype. Start reading now. >http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click >_______________________________________________ >Cdk-devel mailing list >Cdk...@li... >https://lists.sourceforge.net/lists/listinfo/cdk-devel Peter Murray-Rust Unilever Centre for Molecular Informatics Chemistry Department, Cambridge University Lensfield Road, CAMBRIDGE, CB2 1EW, UK Tel: +44-1223-763069 |
From: Egon W. <e.w...@sc...> - 2005-02-10 07:47:29
|
On Wednesday 09 February 2005 08:06 pm, Peter Murray-Rust wrote: > At 11:06 09/02/2005 +0100, Egon Willighagen wrote: > >On Tuesday 08 February 2005 08:57 pm, Rajarshi Guha wrote: > > > For what I'm doing now, it seems the answer is yes. But in the long run > > > will we want to access the BibTeX information from the dictionaries? I > > > think yes, otherwise why would it be there :) > > > >In the long run, we'll use CMLDOM for this. But rumours go that even > > dictRef like things are going to be replaced by RDF... > > I think we'll see that other disciplines (like bioscience) make rapid > progress with ontologies and we should narrow where appropriate. Is there an overview of ontologies people are working on? > But for > now continue to use dictRef with the semantics: > <blurf xmlns:foo="http://www.zzz.org/dict/foo.xml"> > <xyzzy dictRef="foo:bar" >abc</xyzzy> > </blurf> > > The namespace foo must now have a namespace URI/URL (or Java 1.5 will fail > to parse). This URL should be an absolute or relative address. "bar" should > point to the entry with id="bar". Rajarshi, did you read the QSAR dictionary into CDK with Java 1.5? Did you have problems with the default JAXP XML parser of 1.5? > The main problem is the addressing. I am still struggling with this: > - if you use relative URLs and relocate the file, the addresses break > - if you use absolute ones you rely on a web presence or a duplicate of the > local file system Maybe this is of help. I use jEdit as XML editor which uses XML Schema Instance (or so, 'xsi'): <dictionary xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.xml-cml.org/schema/cml2/core cmlAll.xsd http://www.w3.org/1998/Math/MathML mathml2/mathml2.xsd http://bibtexml.sf.net/ bibtexml.xsd" xmlns="http://www.xml-cml.org/schema/cml2/core" xmlns:bibtex="http://bibtexml.sf.net/" xmlns:cvs="https://www.cvshome.org/" xmlns:dc="http://dublincore.org/" xmlns:qsar-descriptors="http://qsar.sourceforge.net/dicts/qsar-descriptors" xmlns:qsar-descriptors-metadata="http://qsar.sourceforge.net/dicts/qsar-descriptors-metadata" id="qsar-descriptors" title="QSAR.sf.net Descriptor Dictionary"> where xsi:schemaLocation gives hints to XML editors where schema's can be found. Also note the definition of the namespace definitions of stuff in dictRef and metadata@name, e.g. qsar-descriptors-metadata, cvs, dc and bibtex. > XML./SGML addresses this sort of thing with a catalog. It isn't perfect and > is non-trivial to implement. So we have to have a local dictionary catalog > of some sort. I think we need to use the same $user.home mechanism as for > .jmol. .jchempaint, .jumbo where each directory has a config.properties > file. In whatever case we have to have: > - the user maintains their dictionaries > - we provide a dictionary installer > - the dictionaries are on the web Indeed. In JChemPaint, STMML based dictionaries can manually be copied into the $HOME/.jchempaint/dicts/ directly... Yes, an installer would be valuable. It would detect .jumbo, .jmol, .jchempaint directory structures, download an dictionary index from, say, the WWMM server, and propose to install the dictionary for the available programs. > It is here where I think RDF will help as the browsers become RDF-aware Egon |
From: Rajarshi G. <rx...@ps...> - 2005-02-10 15:01:23
|
On Thu, 2005-02-10 at 08:45 +0100, Egon Willighagen wrote: > > The namespace foo must now have a namespace URI/URL (or Java 1.5 will fail > > to parse). This URL should be an absolute or relative address. "bar" should > > point to the entry with id="bar". > > Rajarshi, did you read the QSAR dictionary into CDK with Java 1.5? Did you > have problems with the default JAXP XML parser of 1.5? I used Java 1.5 but I had to include the gnujaxp.jar file from the CDK distribution for it to run. So I assume that I was'nt using the default JAXP XML parser from 1.5. Is there a way to check which one is being used? ------------------------------------------------------------------- Rajarshi Guha <rx...@ps...> <http://jijo.cjb.net> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- The most important statistic for car manufacturers is autocorrelation. |
From: Egon W. <e.w...@sc...> - 2005-02-10 15:17:07
|
On Thursday 10 February 2005 04:01 pm, Rajarshi Guha wrote: > On Thu, 2005-02-10 at 08:45 +0100, Egon Willighagen wrote: > > > The namespace foo must now have a namespace URI/URL (or Java 1.5 will > > > fail to parse). This URL should be an absolute or relative address. > > > "bar" should point to the entry with id="bar". > > > > Rajarshi, did you read the QSAR dictionary into CDK with Java 1.5? Did > > you have problems with the default JAXP XML parser of 1.5? > > I used Java 1.5 but I had to include the gnujaxp.jar file from the CDK > distribution for it to run. So I assume that I was'nt using the default > JAXP XML parser from 1.5. Is there a way to check which one is being > used? Run your program with -Dcdk.debugging=true, and look through the cdk.log. JAXP has a higher preference in CDK than gnujaxp, you I guess it takes that one... Egon |
From: Rajarshi G. <rx...@ps...> - 2005-02-10 15:23:46
|
On Thu, 2005-02-10 at 16:15 +0100, Egon Willighagen wrote: > On Thursday 10 February 2005 04:01 pm, Rajarshi Guha wrote: > > On Thu, 2005-02-10 at 08:45 +0100, Egon Willighagen wrote: > > > > The namespace foo must now have a namespace URI/URL (or Java 1.5 will > > > > fail to parse). This URL should be an absolute or relative address. > > > > "bar" should point to the entry with id="bar". > > > > > > Rajarshi, did you read the QSAR dictionary into CDK with Java 1.5? Did > > > you have problems with the default JAXP XML parser of 1.5? > > > > I used Java 1.5 but I had to include the gnujaxp.jar file from the CDK > > distribution for it to run. So I assume that I was'nt using the default > > JAXP XML parser from 1.5. Is there a way to check which one is being > > used? > > Run your program with -Dcdk.debugging=true, and look through the cdk.log. > JAXP has a higher preference in CDK than gnujaxp, you I guess it takes that > one... OK did that. Without placing gnujaxp.jar in my classpath my program fails with the message in the debug output: org.openscience.cdk.io.ReaderFactory INFO: Detected format: HyperChem HIN org.openscience.cdk.dict.DictionaryDatabase INFO: Reading dictionary from org/openscience/cdk/dict/data/chemical.xml Exception in thread "main" java.lang.NoClassDefFoundError: gnu/xml/aelfred2/XmlReader at org.openscience.cdk.dict.Dictionary.unmarshal (Dictionary.java:70) at org.openscience.cdk.dict.DictionaryDatabase.readDictionary (DictionaryDatabase.java:79) at org.openscience.cdk.dict.DictionaryDatabase.<init>(DictionaryDatabase.java:65) at org.openscience.cdk.qsar.DescriptorEngine.<init>(DescriptorEngine.java:145) at desc.main(desc.java:18) This is understandable, but if JAXP has a higher preference why does it require the gnujaxp.jar file? However after including the jar file I see the following lines in the log file: org.openscience.cdk.config.AtomTypeFactory INFO: Reading config file from org/openscience/cdk/config/data/structgen_atomtypes.xml org.openscience.cdk.config.AtomTypeFactory DEBUG: configFile must be a stream org.openscience.cdk.config.atomtypes.AtomTypeReader INFO: Using JAXP/SAX XML parser. ------------------------------------------------------------------- Rajarshi Guha <rx...@ps...> <http://jijo.cjb.net> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- All great ideas are controversial, or have been at one time. |
From: Egon W. <e.w...@sc...> - 2005-02-10 15:28:31
|
On Thursday 10 February 2005 04:23 pm, Rajarshi Guha wrote: > OK did that. > > Without placing gnujaxp.jar in my classpath my program fails with the > message in the debug output: > > org.openscience.cdk.io.ReaderFactory INFO: Detected format: HyperChem > HIN > org.openscience.cdk.dict.DictionaryDatabase INFO: Reading dictionary > from org/openscience/cdk/dict/data/chemical.xml > Exception in thread "main" java.lang.NoClassDefFoundError: > gnu/xml/aelfred2/XmlReader > at org.openscience.cdk.dict.Dictionary.unmarshal > (Dictionary.java:70) > at org.openscience.cdk.dict.DictionaryDatabase.readDictionary > (DictionaryDatabase.java:79) > at > org.openscience.cdk.dict.DictionaryDatabase.<init>(DictionaryDatabase.java: >65) at > org.openscience.cdk.qsar.DescriptorEngine.<init>(DescriptorEngine.java:145) > at desc.main(desc.java:18) > > This is understandable, but if JAXP has a higher preference why does it > require the gnujaxp.jar file? Yes, because I got confused... The dict stuff indeed requires GNUJAXP... Sorry... > However after including the jar file I see the following lines in the > log file: > > org.openscience.cdk.config.AtomTypeFactory INFO: Reading config file > from org/openscience/cdk/config/data/structgen_atomtypes.xml > org.openscience.cdk.config.AtomTypeFactory DEBUG: configFile must be a > stream > org.openscience.cdk.config.atomtypes.AtomTypeReader INFO: Using JAXP/SAX > XML parser. Correct. The cdk.config stuff and the cdk.io XML stuff does tries to instantiate JAXP/SAX before GNUJAXP... Sorry, for the misinformation. Egon |
From: Rajarshi G. <rx...@ps...> - 2005-02-10 15:26:57
|
On Thu, 2005-02-10 at 16:15 +0100, Egon Willighagen wrote: > On Thursday 10 February 2005 04:01 pm, Rajarshi Guha wrote: > > On Thu, 2005-02-10 at 08:45 +0100, Egon Willighagen wrote: > > > > The namespace foo must now have a namespace URI/URL (or Java 1.5 will > > > > fail to parse). This URL should be an absolute or relative address. > > > > "bar" should point to the entry with id="bar". > > > > > > Rajarshi, did you read the QSAR dictionary into CDK with Java 1.5? Did > > > you have problems with the default JAXP XML parser of 1.5? > > > > I used Java 1.5 but I had to include the gnujaxp.jar file from the CDK > > distribution for it to run. So I assume that I was'nt using the default > > JAXP XML parser from 1.5. Is there a way to check which one is being > > used? > > Run your program with -Dcdk.debugging=true, and look through the cdk.log. > JAXP has a higher preference in CDK than gnujaxp, you I guess it takes that > one... Forgot some lines in the previous mail. From the log file I see messages from readDictionary() org.openscience.cdk.dict.DictionaryDatabase INFO: Reading dictionary from org/openscience/cdk/dict/data/chemical.xml org.openscience.cdk.dict.Dictionary DEBUG: Using Aelfred2 XML parser. org.openscience.cdk.dict.Dictionary DEBUG: Deactivated validation org.openscience.cdk.dict.DictionaryDatabase DEBUG: Read dictionary: chemical org.openscience.cdk.dict.DictionaryDatabase INFO: Reading dictionary from org/openscience/cdk/dict/data/elements.xml org.openscience.cdk.dict.Dictionary DEBUG: Using Aelfred2 XML parser. org.openscience.cdk.dict.Dictionary DEBUG: Deactivated validation org.openscience.cdk.dict.DictionaryDatabase DEBUG: Read dictionary: elements org.openscience.cdk.dict.DictionaryDatabase INFO: Reading dictionary from org/openscience/cdk/dict/data/qsar-descriptors.xml org.openscience.cdk.dict.Dictionary DEBUG: Using Aelfred2 XML parser. org.openscience.cdk.dict.Dictionary DEBUG: Deactivated validation org.openscience.cdk.dict.DictionaryDatabase DEBUG: Read dictionary: qsar-descriptors So it looks like its using the gnujaxp parser rather than the native 1.5 JAXP. But as I mentioned if I dont include the gnujaxp jar the program will fail. I can send you the log file if it helps ------------------------------------------------------------------- Rajarshi Guha <rx...@ps...> <http://jijo.cjb.net> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- Eureka! -- Archimedes |
From: Rajarshi G. <rx...@ps...> - 2005-02-09 01:29:18
|
I've comitted the changes to the dictionary routines so that now it is possible to use DescriptorEngine to select specific types of descriptors based on their classification in the dictionary. Some details below On Tue, 2005-02-08 at 13:16 -0500, Rajarshi Guha wrote: > Hi, > I'm attempting to get descriptor classes from the qsar-descriptors.xml > file using the DictionaryDatabase functions. > > After digging a little into DictionaryHandler.java I see that in the > function startElement(), any <entry> element is added to the dictionary. > The problem that I'm seeing is that this places the BibTeX entries as > Entry objects into the returned dictionary. > > Is there any way to skip BibTeX entries? Done. DictionaryHandler is modified to act on specific nodes (entry but not bibtex:entry, metadataList and metadata) > Secondly, the startElement() function only appears to add the 'id' and > 'term' tags to the current entry. In the case of the descriptor > dictionaries, the information from the metadataList and metadata tags > are not added. Adding these tags to an entry will require that Entry be > modified to include fields for meta data - is it OK to add it or will > that impact other uses of Entry? Entry was modified so that it now has a field for descriptor metadata. So if a dictionary does have metadata related to descriptors (qsar- descriptors:) it gets stored > Thirdly, in addEntry() of Dictioanry.java the line that addes the entry > to the array of entries reads: > > entries.put(entry.getID().toLowerCase(), entry); Silly question - don't need to fiddle with lower/upper case My main concern is that I have employed the proper technique while parsing the XML document in DictionaryHandler. It does'nt brek any of the JUnit tests - but it might not be optimal ------------------------------------------------------------------- Rajarshi Guha <rx...@ps...> <http://jijo.cjb.net> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- Does Ramanujan know Polish? -- E.B. Ross |
From: Egon W. <e.w...@sc...> - 2005-02-09 10:06:39
|
On Wednesday 09 February 2005 02:29 am, Rajarshi Guha wrote: > My main concern is that I have employed the proper technique while > parsing the XML document in DictionaryHandler. It does'nt brek any of > the JUnit tests - but it might not be optimal I've briefly gone through the commit diff, and looks excellent. Egon |