From: Ian B. J. <ij...@w3...> - 2006-07-07 01:47:15
|
Hello, One of my colleagues reported [1] an issue where UTF-8 characters are escaped like this "\u00D0" (correct codepoints but escaped instead of the actual characters).=20 Richard Cyganiak kindly made a suggestion [2]: =20 "Set UNIC_RDF to FALSE again to avoid this." However, when I set UNIC_RDF to FALSE, the parser seems to fail. Here is the query: PREFIX : <http://www.w3.org/2000/10/swap/pim/contact#> PREFIX doc: <http://www.w3.org/2000/10/swap/pim/doc#> PREFIX mat: <http://www.w3.org/2002/05/matrix/vocab#>=20 PREFIX org: <http://www.w3.org/2001/04/roadmap/org#> PREFIX rec: <http://www.w3.org/2001/02pd/rec54#>=20 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?doc, ?editor, ?title, ?date, ?versionOf, ?type, ?supersedes WHERE = {?doc rdf:type ?type; dc:title ?title; dc:date ?date; doc:versionOf ?versi= onOf. OPTIONAL {?doc rec:supersedes ?supersedes} OPTIONAL {?doc rec:editor= [:fullName ?editor ] .}} ORDER BY DESC(?date) Here is the RDF source: http://www.w3.org/2002/01/tr-automation/tr.rdf I find that if I set UNIC_RDF to TRUE, parsing succeeds but I have the escaping issue. If I set it fo FALSE, parsing fails (with a "255" error tha= t I did not examine closely). I would appreciate any suggestions and hope the above information is sufficient to run the test; please let me know if more information is=20 required. Thank you, _ Ian Jacobs [1] http://sourceforge.net/mailarchive/message.php?msg_id=3D15149273 [2] http://sourceforge.net/mailarchive/message.php?msg_id=3D15149274 --=20 Ian Jacobs (ij...@w3...) http://www.w3.org/People/Jacobs Tel: +1 718 260-9447 |
From: Richard C. <ri...@cy...> - 2006-07-10 09:56:52
|
Ian, The file parses fine here, with UNIC_RDF set to false. I didn't try to run a SPARQL query though, so maybe the problem is not parsing but somewhere in the SPARQL engine. Can you please provide the exact error message, and a code sample that produces the error? Which version of RAP and PHP ("php -v") is this? Cheers, Richard On 7 Jul 2006, at 03:46, Ian B. Jacobs wrote: > Hello, > > One of my colleagues reported [1] an issue where UTF-8 characters > are escaped like this "\u00D0" (correct codepoints but escaped > instead of the actual characters). > > Richard Cyganiak kindly made a suggestion [2]: > > "Set UNIC_RDF to FALSE again to avoid this." > > However, when I set UNIC_RDF to FALSE, the parser seems to fail. > Here is the query: > > PREFIX : <http://www.w3.org/2000/10/swap/pim/contact#> > PREFIX doc: <http://www.w3.org/2000/10/swap/pim/doc#> > PREFIX mat: <http://www.w3.org/2002/05/matrix/vocab#> > PREFIX org: <http://www.w3.org/2001/04/roadmap/org#> > PREFIX rec: <http://www.w3.org/2001/02pd/rec54#> > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > PREFIX dc: <http://purl.org/dc/elements/1.1/> > SELECT ?doc, ?editor, ?title, ?date, ?versionOf, ?type, ? > supersedes WHERE {?doc rdf:type ?type; dc:title ?title; dc:date ? > date; doc:versionOf ?versionOf. OPTIONAL {?doc rec:supersedes ? > supersedes} OPTIONAL {?doc rec:editor [:fullName ?editor ] .}} > ORDER BY DESC(?date) > > Here is the RDF source: > http://www.w3.org/2002/01/tr-automation/tr.rdf > > I find that if I set UNIC_RDF to TRUE, parsing succeeds but I have the > escaping issue. If I set it fo FALSE, parsing fails (with a "255" > error that > I did not examine closely). > > I would appreciate any suggestions and hope the above information is > sufficient to run the test; please let me know if more information is > required. > > Thank you, > > _ Ian Jacobs > > > > [1] http://sourceforge.net/mailarchive/message.php?msg_id=15149273 > [2] http://sourceforge.net/mailarchive/message.php?msg_id=15149274 > -- > Ian Jacobs (ij...@w3...) http://www.w3.org/People/Jacobs > Tel: +1 718 260-9447 > > ---------------------------------------------------------------------- > --- > Using Tomcat but need to do more? Need to support web services, > security? > Get stuff done quickly with pre-integrated technology to make your > job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo > http://sel.as-us.falkag.net/sel? > cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Rdfapi-php-interest mailing list > Rdf...@li... > https://lists.sourceforge.net/lists/listinfo/rdfapi-php-interest |
From: Richard C. <ri...@cy...> - 2006-07-10 20:50:07
|
Hi Ian, Thanks for the additional information! I *think* (but haven't =20 actually tried) that a small change in line 1282 of api/sparql/=20 SparqlEngine.php should fix the problem. Could you please try to replace $label =3D htmlentities($varvalue->getLabel()); with $label =3D htmlspecialchars($varvalue->getLabel()); and report back if it works? Cheers, Richard On 10 Jul 2006, at 16:37, Ian B. Jacobs wrote: > On Mon, 2006-07-10 at 11:56 +0200, Richard Cyganiak wrote: >> Ian, >> >> The file parses fine here, with UNIC_RDF set to false. I didn't try >> to run a SPARQL query though, so maybe the problem is not parsing but >> somewhere in the SPARQL engine. Can you please provide the exact >> error message, and a code sample that produces the error? Which >> version of RAP and PHP ("php -v") is this? > > Hello Richard, > > Here's more information; I hope it helps. Thanks for your work on =20 > this. > > _ Ian > > [Info provided by my colleague Dom] > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D > Version info: > dom@cumulustier:~$ php5 -v > PHP 5.1.4-0.1 (cli) (built: Jun 13 2006 21:46:20) > dom@cumulustier:~$ > less /usr/local/lib/php/rdfapi-php/api/RdfAPI.php |grep "@version" > // @version : $Id: RdfAPI.php,v 1.20 =20 > 2006/05/15 > 05:24:35 tgauss Exp $ > > When I run the SPARQL query on the tr.rdf with UNIC_RDF set to =20 > false, I > get back a bunch of PHP errors =E0 la: > > Warning: simplexml_load_string(): > Entity: line 1: parser error : Entity 'acirc' not defined > in /usr/local/lib/php/rdfapi-php/api/sparql/=20 > SparqlEngine.php on > line 1260 > > Warning: simplexml_load_string(): iteral>Mark > Baker</literal></binding><binding > name=3D"title"><literal>XHTMLâ > in /usr/local/lib/php/rdfapi-php/api/sparql/SparqlEngine.php on line > 1260 > > Warning: simplexml_load_string(): > ^ > in /usr/local/lib/php/rdfapi-php/api/sparql/SparqlEngine.php on line > 1260 > > Warning: simplexml_load_string(): Entity: line 1: parser =20 > error : > Input is not proper UTF-8, indicate encoding ! > Bytes: 0x84 0x26 0x63 0x65 > in /usr/local/lib/php/rdfapi-php/api/sparql/SparqlEngine.php on line > 1260 > > When I experimented to see what went wrong, it looks like the function > that outputs the results as XML was failing on the intermediary =20 > content. > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> On 7 Jul 2006, at 03:46, Ian B. Jacobs wrote: >> >>> Hello, >>> >>> One of my colleagues reported [1] an issue where UTF-8 characters >>> are escaped like this "\u00D0" (correct codepoints but escaped >>> instead of the actual characters). >>> >>> Richard Cyganiak kindly made a suggestion [2]: >>> >>> "Set UNIC_RDF to FALSE again to avoid this." >>> >>> However, when I set UNIC_RDF to FALSE, the parser seems to fail. >>> Here is the query: >>> >>> PREFIX : <http://www.w3.org/2000/10/swap/pim/contact#> >>> PREFIX doc: <http://www.w3.org/2000/10/swap/pim/doc#> >>> PREFIX mat: <http://www.w3.org/2002/05/matrix/vocab#> >>> PREFIX org: <http://www.w3.org/2001/04/roadmap/org#> >>> PREFIX rec: <http://www.w3.org/2001/02pd/rec54#> >>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >>> PREFIX dc: <http://purl.org/dc/elements/1.1/> >>> SELECT ?doc, ?editor, ?title, ?date, ?versionOf, ?type, ? >>> supersedes WHERE {?doc rdf:type ?type; dc:title ?title; dc:date ? >>> date; doc:versionOf ?versionOf. OPTIONAL {?doc rec:supersedes ? >>> supersedes} OPTIONAL {?doc rec:editor [:fullName ?editor ] .}} >>> ORDER BY DESC(?date) >>> >>> Here is the RDF source: >>> http://www.w3.org/2002/01/tr-automation/tr.rdf >>> >>> I find that if I set UNIC_RDF to TRUE, parsing succeeds but I =20 >>> have the >>> escaping issue. If I set it fo FALSE, parsing fails (with a "255" >>> error that >>> I did not examine closely). >>> >>> I would appreciate any suggestions and hope the above information is >>> sufficient to run the test; please let me know if more =20 >>> information is >>> required. >>> >>> Thank you, >>> >>> _ Ian Jacobs >>> >>> >>> >>> [1] http://sourceforge.net/mailarchive/message.php?msg_id=3D15149273 >>> [2] http://sourceforge.net/mailarchive/message.php?msg_id=3D15149274 >>> --=20 >>> Ian Jacobs (ij...@w3...) http://www.w3.org/People/Jacobs >>> Tel: +1 718 260-9447 >>> >>> --------------------------------------------------------------------=20= >>> -- >>> --- >>> Using Tomcat but need to do more? Need to support web services, >>> security? >>> Get stuff done quickly with pre-integrated technology to make your >>> job easier >>> Download IBM WebSphere Application Server v.1.0.1 based on Apache >>> Geronimo >>> http://sel.as-us.falkag.net/sel? >>> cmd=3Dlnk&kid=3D120709&bid=3D263057&dat=3D121642 >>> _______________________________________________ >>> Rdfapi-php-interest mailing list >>> Rdf...@li... >>> https://lists.sourceforge.net/lists/listinfo/rdfapi-php-interest > --=20 > Ian Jacobs (ij...@w3...) http://www.w3.org/People/Jacobs > Tel: +1 718 260-9447 |
From: Ian B. J. <ij...@w3...> - 2006-07-11 13:40:44
|
Hello Richard, The change seems to do the trick. Thank you! _ Ian On Mon, 2006-07-10 at 22:50 +0200, Richard Cyganiak wrote: > Hi Ian, >=20 > Thanks for the additional information! I *think* (but haven't =20 > actually tried) that a small change in line 1282 of api/sparql/=20 > SparqlEngine.php should fix the problem. Could you please try to replace >=20 > $label =3D htmlentities($varvalue->getLabel()); >=20 > with >=20 > $label =3D htmlspecialchars($varvalue->getLabel()); >=20 > and report back if it works? >=20 > Cheers, > Richard >=20 >=20 > On 10 Jul 2006, at 16:37, Ian B. Jacobs wrote: >=20 > > On Mon, 2006-07-10 at 11:56 +0200, Richard Cyganiak wrote: > >> Ian, > >> > >> The file parses fine here, with UNIC_RDF set to false. I didn't try > >> to run a SPARQL query though, so maybe the problem is not parsing but > >> somewhere in the SPARQL engine. Can you please provide the exact > >> error message, and a code sample that produces the error? Which > >> version of RAP and PHP ("php -v") is this? > > > > Hello Richard, > > > > Here's more information; I hope it helps. Thanks for your work on =20 > > this. > > > > _ Ian > > > > [Info provided by my colleague Dom] > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D > > Version info: > > dom@cumulustier:~$ php5 -v > > PHP 5.1.4-0.1 (cli) (built: Jun 13 2006 21:46:20) > > dom@cumulustier:~$ > > less /usr/local/lib/php/rdfapi-php/api/RdfAPI.php |grep "@version" > > // @version : $Id: RdfAPI.php,v 1.20 =20 > > 2006/05/15 > > 05:24:35 tgauss Exp $ > > > > When I run the SPARQL query on the tr.rdf with UNIC_RDF set to =20 > > false, I > > get back a bunch of PHP errors =E0 la: > > > > Warning: simplexml_load_string(): > > Entity: line 1: parser error : Entity 'acirc' not defined > > in /usr/local/lib/php/rdfapi-php/api/sparql/=20 > > SparqlEngine.php on > > line 1260 > > > > Warning: simplexml_load_string(): iteral>Mark > > Baker</literal></binding><binding > > name=3D"title"><literal>XHTMLâ > > in /usr/local/lib/php/rdfapi-php/api/sparql/SparqlEngine.php on line > > 1260 > > > > Warning: simplexml_load_string(): > > ^ > > in /usr/local/lib/php/rdfapi-php/api/sparql/SparqlEngine.php on line > > 1260 > > > > Warning: simplexml_load_string(): Entity: line 1: parser =20 > > error : > > Input is not proper UTF-8, indicate encoding ! > > Bytes: 0x84 0x26 0x63 0x65 > > in /usr/local/lib/php/rdfapi-php/api/sparql/SparqlEngine.php on line > > 1260 > > > > When I experimented to see what went wrong, it looks like the function > > that outputs the results as XML was failing on the intermediary =20 > > content. > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >> On 7 Jul 2006, at 03:46, Ian B. Jacobs wrote: > >> > >>> Hello, > >>> > >>> One of my colleagues reported [1] an issue where UTF-8 characters > >>> are escaped like this "\u00D0" (correct codepoints but escaped > >>> instead of the actual characters). > >>> > >>> Richard Cyganiak kindly made a suggestion [2]: > >>> > >>> "Set UNIC_RDF to FALSE again to avoid this." > >>> > >>> However, when I set UNIC_RDF to FALSE, the parser seems to fail. > >>> Here is the query: > >>> > >>> PREFIX : <http://www.w3.org/2000/10/swap/pim/contact#> > >>> PREFIX doc: <http://www.w3.org/2000/10/swap/pim/doc#> > >>> PREFIX mat: <http://www.w3.org/2002/05/matrix/vocab#> > >>> PREFIX org: <http://www.w3.org/2001/04/roadmap/org#> > >>> PREFIX rec: <http://www.w3.org/2001/02pd/rec54#> > >>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > >>> PREFIX dc: <http://purl.org/dc/elements/1.1/> > >>> SELECT ?doc, ?editor, ?title, ?date, ?versionOf, ?type, ? > >>> supersedes WHERE {?doc rdf:type ?type; dc:title ?title; dc:date ? > >>> date; doc:versionOf ?versionOf. OPTIONAL {?doc rec:supersedes ? > >>> supersedes} OPTIONAL {?doc rec:editor [:fullName ?editor ] .}} > >>> ORDER BY DESC(?date) > >>> > >>> Here is the RDF source: > >>> http://www.w3.org/2002/01/tr-automation/tr.rdf > >>> > >>> I find that if I set UNIC_RDF to TRUE, parsing succeeds but I =20 > >>> have the > >>> escaping issue. If I set it fo FALSE, parsing fails (with a "255" > >>> error that > >>> I did not examine closely). > >>> > >>> I would appreciate any suggestions and hope the above information is > >>> sufficient to run the test; please let me know if more =20 > >>> information is > >>> required. > >>> > >>> Thank you, > >>> > >>> _ Ian Jacobs > >>> > >>> > >>> > >>> [1] http://sourceforge.net/mailarchive/message.php?msg_id=3D15149273 > >>> [2] http://sourceforge.net/mailarchive/message.php?msg_id=3D15149274 > >>> --=20 > >>> Ian Jacobs (ij...@w3...) http://www.w3.org/People/Jacobs > >>> Tel: +1 718 260-9447 > >>> > >>> --------------------------------------------------------------------=20 > >>> -- > >>> --- > >>> Using Tomcat but need to do more? Need to support web services, > >>> security? > >>> Get stuff done quickly with pre-integrated technology to make your > >>> job easier > >>> Download IBM WebSphere Application Server v.1.0.1 based on Apache > >>> Geronimo > >>> http://sel.as-us.falkag.net/sel? > >>> cmd=3Dlnk&kid=3D120709&bid=3D263057&dat=3D121642 > >>> _______________________________________________ > >>> Rdfapi-php-interest mailing list > >>> Rdf...@li... > >>> https://lists.sourceforge.net/lists/listinfo/rdfapi-php-interest > > --=20 > > Ian Jacobs (ij...@w3...) http://www.w3.org/People/Jacobs > > Tel: +1 718 260-9447 --=20 Ian Jacobs (ij...@w3...) http://www.w3.org/People/Jacobs Tel: +1 718 260-9447 |