From: Steve B. <ste...@va...> - 2017-06-29 17:34:23
|
We have set up a Blazegraph instance at https://sparql.vanderbilt.edu and have loaded it with data. Those data include multilingual translations of a controlled vocabulary. The problem I'm having occurs when I send via HTTP to the endpoint a query that contains URL encoded literals that represent non-ASCII UTF-8 characters from other languages. Here is a query that contains a literal with the character "ú" (URL encoded as %C3%BA): SELECT DISTINCT ?term where { ?term <http://www.w3.org/2000/01/rdf-schema#isDefinedBy> <http://rs.tdwg.org/cv/status/>. ?term <http://www.w3.org/2004/02/skos/core#hiddenLabel> 'Común'. } If I paste the query into the Blazegraph SPARQL endpoint GUI box at https://sparql.vanderbilt.edu, I get a single result: the URI <http://rs.tdwg.org/cv/status/extant> as I should. However, if I URL encode the query and sent it to the same Blazegraph endpoint via HTTP GET as: https://sparql.vanderbilt.edu/sparql?query=SELECT%20DISTINCT%20%3Fterm%20where%20%7B%0A%3Fterm%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23isDefinedBy%3E%20%3Chttp%3A%2F%2Frs.tdwg.org%2Fcv%2Fstatus%2F%3E.%0A%3Fterm%20%3Chttp%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23hiddenLabel%3E%20%27Com%C3%BAn%27.%0A%7D there are no results. I'm confident that the URL encoding is correct because if the same query is sent via HTTP GET to our old Callimachus-based endpoint, it returns the correct URI: http://rdf.library.vanderbilt.edu/sparql?query=SELECT%20DISTINCT%20%3Fterm%20where%20%7B%0A%3Fterm%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23isDefinedBy%3E%20%3Chttp%3A%2F%2Frs.tdwg.org%2Fcv%2Fstatus%2F%3E.%0A%3Fterm%20%3Chttp%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23hiddenLabel%3E%20%27Com%C3%BAn%27.%0A%7D The problem appears to be with the endpoint's handling of the encoded literal characters. If I change the literal to one that does not contain non-ASCII characters, such as "common" (a hidden label synonym of "Común"), so that the query is: SELECT DISTINCT ?term where { ?term <http://www.w3.org/2000/01/rdf-schema#isDefinedBy> <http://rs.tdwg.org/cv/status/>. ?term <http://www.w3.org/2004/02/skos/core#hiddenLabel> 'common'. } the URL encoded HTTP GET: https://sparql.vanderbilt.edu/sparql?query=SELECT%20DISTINCT%20%3Fterm%20where%20%7B%0A%3Fterm%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23isDefinedBy%3E%20%3Chttp%3A%2F%2Frs.tdwg.org%2Fcv%2Fstatus%2F%3E.%0A%3Fterm%20%3Chttp%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23hiddenLabel%3E%20%27common%27.%0A%7D produces the correct result. I am at a loss as to where to go with this as far as trouble-shooting is concerned. It's possible that there is some incorrect server configuration setting that I don't know about. I suppose it's also possible that it's a bug. This is a rather serious problem for us because we have significant multilingual data in the triplestore and users need to be able to search against it. Steve Baskauf -- Steven J. Baskauf, Ph.D., Senior Lecturer Vanderbilt University Dept. of Biological Sciences postal mail address: PMB 351634 Nashville, TN 37235-1634, U.S.A. delivery address: 2125 Stevenson Center 1161 21st Ave., S. Nashville, TN 37235 office: 2128 Stevenson Center phone: (615) 343-4582, fax: (615) 322-4942 If you fax, please phone or email so that I will know to look for it. http://bioimages.vanderbilt.edu http://vanderbilt.edu/trees |