From: SourceForge.net <no...@so...> - 2010-06-16 15:30:44
|
Bugs item #2960840, was opened at 2010-02-28 20:05 Message generated for change (Comment added) made by rvos You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1126676&aid=2960840&group_id=248804 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: ui Group: None >Status: Closed Priority: 7 Private: No Submitted By: William Piel (sfrgpiel) Assigned to: Rutger Vos (rvos) Summary: PhyloWS API Issues Need Fixing Initial Comment: A couple of things should be fixed with PhyloWS, some are higher value than others. I would rank items 1 and 4 at level "9," with the other ones lower (but if they are low-hanging easy fixes, it would be great to fix them before the release). 1. Failure with Safari RSS browser: Originally we thought that this failed because RSS results are not delivered with <?xml version="1.0" encoding="utf-8"?> at the top -- but that's not the case. 2. The initial url (e.g. http://purl.org/phylo/treebase/phylows/taxon/find?query=dcterms.title==%22Homo%20sapiens%22&recordSchema=study&format=rss1) uses purl.org but returns a list where the domains change from purl.org to nescent.org -- I'd rather that we not proliferate synonymous URIs that have different domains. Let's keep all "/phylows/" urls with the purl.org domain. 3. Let's add some verbiage to each RSS 1.0 result so that a minium synopsis of readable info is provided to users of RSS browsers. For studies, provide the citation; for matrices, provide matrix name and data type; for trees, provide tree name, tree title, tree type; for taxa, provide the taxon name. So, for example, while the current implementation looks like this: <item rdf:about="http://treebase-dev.nescent.org:6666/treebase-web/phylows/study/TB2:S1925"> <title>TB2:S1925</title> <link>http://treebase-dev.nescent.org:6666/treebase-web/phylows/study/TB2:S1925</link> <description>TB2:S1925</description> </item> ...We need to be more verbose, not only in the <description> so that the user reading the RSS feed has some idea what it is about, but also in the <prism> and <dc> contents, so that machines reading this feed can do something useful with it. In the case of a study, let's return this: <item rdf:about="http://purl.org/phylo/treebase/phylows/study/TB2:S1925"> <title><![CDATA[[Study] Phylogenetic study of clavicipitaceous fungi using acetaldehyde dehydrogenase gene sequences]]></title> <link>http://purl.org/phylo/treebase/phylows/study/TB2:S1925</link> <description><![CDATA[Tanaka, E. and C. Tanaka. 2008. Phylogenetic study of clavicipitaceous fungi using acetaldehyde dehydrogenase gene sequences. Mycoscience, 49(20): 115-125.]]></description> <dc:creator><![CDATA[Tanaka, E.; Tanaka, C.]]></dc:creator> <dc:date>2007-01-01</dc:date> <dc:subject><![CDATA[Phylogenies]]></dc:subject> <dc:title><![CDATA[[Study] Phylogenetic study of clavicipitaceous fungi using acetaldehyde dehydrogenase gene sequences]]></dc:title> <dc:publisher>Mycoscience</dc:publisher> <prism:publicationName>Mycoscience</prism:publicationName> <prism:contributor>Tanaka, Eiji</prism:contributor> <prism:contributor>Tanaka, Chihiro</prism:contributor> <prism:volume>49</prism:volume> <prism:pageRange>115-125</prism:pageRange> <prism:startingPage>115</prism:startingPage> <prism:endingPage>125</prism:endingPage> <prism:doi>10.1007/s10267-007-0401-5</prism:doi> <dcterms:bibliographicCitation>Tanaka, E. and C. Tanaka. 2008. Phylogenetic study of clavicipitaceous fungi using acetaldehyde dehydrogenase gene sequences. Mycoscience, 49(20): 115-125.</dcterms:bibliographicCitation> <prism:publicationDate>2007-01-01</prism:publicationDate> <prism:section>Study</prism:section> </item> 4. PhyloWS requests that search on taxa to return a list of trees are so slow as to be unusable. This problem may be fixed by Youjun, seeing as it is also slow in the web GUI interface and Youjun is tackling the problem there. http://purl.org/phylo/treebase/phylows/taxon/find?query=dcterms.title==%22Homo%20sapiens%22&recordSchema=tree&format=rss1 5. My understanding is that our /phylows/study/find? queries do not support searching on the journal name of an article. It would be great if this were offered because that would let us give journal editors RSS feeds into their own data. ---------------------------------------------------------------------- >Comment By: Rutger Vos (rvos) Date: 2010-06-16 15:30 Message: 1. Failure with Safari RSS browser: this is a two-part problem, which made it hard to disentangle. One part is that in some cases, non-UTF-8 characters crept in. I've added a static method that filters out these characters from risky strings such as citation titles and abstracts. The other problem seemed to be the previous way in which the client was redirected to the search RSS. This was previously done using a "RedirectView" class, and Safari's feed:// pseudo-protocol didn't like that. This explains why everything seemed fine offline, under some testing scenarios. I changed the implementation to instead return a "ModelAndView" class, and things work now. 2. fixed: the links listed in the RSS are purls. 3. fixed: the returned metadata is now much more extensive, as per the example. 4. not applicable: this isn't a function of the PhyloWS API but of the underlying performance of the core. I am ignoring this item for this ticket. 5. fixed: the predicate for journal names is prism.publicationName, and you can do exact matching ("==") so that you get all matches for "Evolution" (and not also those for "Evolutionary Bioinformatics" etc.) ---------------------------------------------------------------------- Comment By: Rutger Vos (rvos) Date: 2010-06-16 15:30 Message: Your bug has been resolved. Thanks for the report. ---------------------------------------------------------------------- Comment By: Hilmar Lapp (hlapp) Date: 2010-04-27 16:05 Message: Needs fixing but not threatening TB2 operation. Downgraded to priority 7. ---------------------------------------------------------------------- Comment By: William Piel (sfrgpiel) Date: 2010-04-27 13:12 Message: For the record, I'm adding the following correspondence with Rutger: Hi Rutger, Regarding the RSS problem, I've been playing around with this validator: http://www.ldodds.com/rss_validator/1.0/validator.html ... with some success, but not 100%. One thing I noticed is that if I download the source with Firefox, the saved file has characters in Latin1 instead of UTF8. Thinking that that might be the problem, I posted both original and converted files for this (http://treebase.nescent.org/treebase-web/phylows/study/find?query=prism.publicationName=Nature&format=rss1) feed here: http://treebase.peabody.yale.edu/~piel/nature_orig.rdf http://treebase.peabody.yale.edu/~piel/nature_utf8.rdf Oddly enough, it doesn't seem to make a difference -- in that Safari happily reads and displays both of these. Also, the validator gives the same result whether using the original URL (http://treebase.nescent.org/treebase-web/phylows/study/find?query=prism.publicationName=Nature&format=rss1) or one of these two (it's not happy about the namespace resolution, but other than that, it's okay). So presumably, Safari's problem must have something to do with the mime type or headers communicating the data if they're coming directly from TreeBASE. Next I wanted to look at the case where Firefox won't display it: http://treebase.nescent.org/treebase-web/phylows/study/find?query=prism.publicationName=%22Systematic+Biology%22&format=rss1 If I put this string into the validator, I get: "An invalid XML character (Unicode: 0x1a) was found in the element content of the document." So I put the files here, both original and utf8 converted: http://treebase.peabody.yale.edu/~piel/sysbio_orig.rdf http://treebase.peabody.yale.edu/~piel/sysbio_utf8.rdf Same error with the validator, and Firefox won't render either of them, but oddly enough, Safari is happy to render both of them. So I opened the file in TextWrangler and ran the "Zap Gremlins" feature, and then saved the file here: http://treebase.peabody.yale.edu/~piel/sysbio_zap.rdf With this file, both Firefox and Safari render it okay. However the validator now has a different error: Using org.apache.xerces.parsers.SAXParser Exception net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed. org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed. So... at any rate, it looks like the following is the case: 1. Character coding needs to be in UTF8, even if Safari and Firefox don't have a problem with this -- others might. 2. Safari's problem is in the delivery of the RDF file -- if Apache delivers a static RDF file, Safari has no problem. So it's not the content of the file at issue. 3. Firefox's problem is with hidden control characters. take care, Bill ---------------------------------------------------------------------- Comment By: Rutger Vos (rvos) Date: 2010-03-12 23:47 Message: 1. I think this might be due to the extension(!), I created a mapping so that we give these feeds a .rdf extension and with mime-type application/rss+xml. They now render as expected. 2. we now use purls throughout. 3. this is much harder to do than you'd think. I'll try. 4. this has been fixed already, apparently. 5. we can now use prism.publicationName as a search predicate. I've committed this progress so far, but it has not been reloaded on the server yet. ---------------------------------------------------------------------- Comment By: Hilmar Lapp (hlapp) Date: 2010-03-10 18:25 Message: Downgrading to priority 8 as per conversation with Bill. Itrem #4 has been ifxed already. ---------------------------------------------------------------------- Comment By: Rutger Vos (rvos) Date: 2010-03-01 16:09 Message: Ok, I'll work on this. ---------------------------------------------------------------------- Comment By: Rutger Vos (rvos) Date: 2010-03-01 16:09 Message: Thanks for reporting this bug. We'll look into it as soon as possible. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1126676&aid=2960840&group_id=248804 |