From: Rob V. <rv...@do...> - 2014-02-28 10:09:29
|
Tom I have parsing support working on trunk with the NTriples and NQuads parsers defaulting to the new spec so they'll read the data in as UTF-8 rather than as ASCII. We're passing all the official RDF 1.1 tests for NTriples and NQuads but I can't guarantee there aren't any bugs introduced though the changes to support the new spec turned out to be relatively minor. The parser changes have been merged onto default and I'm continuing to work on the ntuples11 branch for the remainder of the implementation work. There's no output support for these formats yet but hopefully I'll get that done later today. Cheers, Rob On 25/02/2014 10:22, "Rob Vesse" <rv...@do...> wrote: >Tom > >Yes the original NTriples and NQuads specifications only allow ASCII, this >was by design to make those formats canonical (since with UTF-8 you can >potentially encode complex characters in multiple ways) and facilitate >reliable data exchange across systems that didn't necessarily support >non-ASCII data. > >Btw the reader only enforces ASCII encoding if you pass a filename (I.e. >when it deals with opening the file stream), if you pass in a pre-opened >StreamReader that is in a different encoding (I.e. UTF-8) it may still >parse successfully though exact behaviour is hard to know in advance. It >will issue a warning about incorrect encoding (via the Warning event) and >it may error out on some native UTF-8 data since the tokeniser is not >written to expect native UTF-8. > >The RDF 1.1 working group have published proposed recommendations which >standardise NQuads & NTriples and part of the standardization is to change >the encoding to UTF-8 but I haven't had chance to update dotNetRDF to >support the updated specs yet. > >Since this is a breaking change to spec and current API behaviour the >existing tokenizers and parsers would need to be modified so that they can >support either the new/old specification. An approach similar to how we >updated Turtle support where we implement the new specifications and the >parsers default to the new spec mode and the writers implement the new >spec but default to producing the old spec as output would be ideal. This >is Postel's law in action if you're wondering why this is done. > >There are issues filed for these upgrades but I haven't had time to >implement them yet, I was considering trying to get these into the next >release anyway and I have some time to start on this at the end of the >week unless you want to attempt this yourself. See CORE-356 >(http://dotnetrdf.org/tracker/Issues/IssueDetail.aspx?id=356) for NQuads >and CORE-355 (http://dotnetrdf.org/tracker/Issues/IssueDetail.aspx?id=355) >for NTriples which include links to the updated specifications, see the >comments for the most up to date spec links. > >Hope this clarifies things, > >Cheers, > >Rob > >On 25/02/2014 10:06, "Tomasz Pluskiewicz" <tom...@gm...> >wrote: > >>Hi Rob >> >>A colleague of mine has just discovered that the NQuadsParser reads >>file with ASCII encoding while all other use UTF-8. >> >>I understand that this is as described in the specification but why is >>that exactly? >> >>And what do you think about adding a option to the parsers so that >>alternative encodings can be used for reading dataset files? >> >>Cheers, >>Tom >> >>------------------------------------------------------------------------- >>- >>---- >>Flow-based real-time traffic analytics software. Cisco certified tool. >>Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer >>Customize your own dashboards, set traffic alerts and generate reports. >>Network behavioral analysis & security monitoring. All-in-one tool. >>http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clk >>t >>rk >>_______________________________________________ >>dotNetRDF-develop mailing list >>dot...@li... >>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-develop > > > > > >-------------------------------------------------------------------------- >---- >Flow-based real-time traffic analytics software. Cisco certified tool. >Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer >Customize your own dashboards, set traffic alerts and generate reports. >Network behavioral analysis & security monitoring. All-in-one tool. >http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clkt >rk >_______________________________________________ >dotNetRDF-develop mailing list >dot...@li... >https://lists.sourceforge.net/lists/listinfo/dotnetrdf-develop |