From: Tomasz P. <tom...@gm...> - 2014-02-28 10:27:13
|
Good news. Thanks! And this reminds me. We have found a bug in the TriG parser. Will write an email to the other list. Tom On Fri, Feb 28, 2014 at 11:08 AM, Rob Vesse <rv...@do...> wrote: > Tom > > I have parsing support working on trunk with the NTriples and NQuads > parsers defaulting to the new spec so they'll read the data in as UTF-8 > rather than as ASCII. We're passing all the official RDF 1.1 tests for > NTriples and NQuads but I can't guarantee there aren't any bugs introduced > though the changes to support the new spec turned out to be relatively > minor. > > The parser changes have been merged onto default and I'm continuing to > work on the ntuples11 branch for the remainder of the implementation work. > There's no output support for these formats yet but hopefully I'll get > that done later today. > > Cheers, > > Rob > > On 25/02/2014 10:22, "Rob Vesse" <rv...@do...> wrote: > >>Tom >> >>Yes the original NTriples and NQuads specifications only allow ASCII, this >>was by design to make those formats canonical (since with UTF-8 you can >>potentially encode complex characters in multiple ways) and facilitate >>reliable data exchange across systems that didn't necessarily support >>non-ASCII data. >> >>Btw the reader only enforces ASCII encoding if you pass a filename (I.e. >>when it deals with opening the file stream), if you pass in a pre-opened >>StreamReader that is in a different encoding (I.e. UTF-8) it may still >>parse successfully though exact behaviour is hard to know in advance. It >>will issue a warning about incorrect encoding (via the Warning event) and >>it may error out on some native UTF-8 data since the tokeniser is not >>written to expect native UTF-8. >> >>The RDF 1.1 working group have published proposed recommendations which >>standardise NQuads & NTriples and part of the standardization is to change >>the encoding to UTF-8 but I haven't had chance to update dotNetRDF to >>support the updated specs yet. >> >>Since this is a breaking change to spec and current API behaviour the >>existing tokenizers and parsers would need to be modified so that they can >>support either the new/old specification. An approach similar to how we >>updated Turtle support where we implement the new specifications and the >>parsers default to the new spec mode and the writers implement the new >>spec but default to producing the old spec as output would be ideal. This >>is Postel's law in action if you're wondering why this is done. >> >>There are issues filed for these upgrades but I haven't had time to >>implement them yet, I was considering trying to get these into the next >>release anyway and I have some time to start on this at the end of the >>week unless you want to attempt this yourself. See CORE-356 >>(http://dotnetrdf.org/tracker/Issues/IssueDetail.aspx?id=356) for NQuads >>and CORE-355 (http://dotnetrdf.org/tracker/Issues/IssueDetail.aspx?id=355) >>for NTriples which include links to the updated specifications, see the >>comments for the most up to date spec links. >> >>Hope this clarifies things, >> >>Cheers, >> >>Rob >> >>On 25/02/2014 10:06, "Tomasz Pluskiewicz" <tom...@gm...> >>wrote: >> >>>Hi Rob >>> >>>A colleague of mine has just discovered that the NQuadsParser reads >>>file with ASCII encoding while all other use UTF-8. >>> >>>I understand that this is as described in the specification but why is >>>that exactly? >>> >>>And what do you think about adding a option to the parsers so that >>>alternative encodings can be used for reading dataset files? >>> >>>Cheers, >>>Tom >>> >>>------------------------------------------------------------------------- >>>- >>>---- >>>Flow-based real-time traffic analytics software. Cisco certified tool. >>>Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer >>>Customize your own dashboards, set traffic alerts and generate reports. >>>Network behavioral analysis & security monitoring. All-in-one tool. >>>http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clk >>>t >>>rk >>>_______________________________________________ >>>dotNetRDF-develop mailing list >>>dot...@li... >>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-develop >> >> >> >> >> >>-------------------------------------------------------------------------- >>---- >>Flow-based real-time traffic analytics software. Cisco certified tool. >>Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer >>Customize your own dashboards, set traffic alerts and generate reports. >>Network behavioral analysis & security monitoring. All-in-one tool. >>http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clkt >>rk >>_______________________________________________ >>dotNetRDF-develop mailing list >>dot...@li... >>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-develop > > > > > > ------------------------------------------------------------------------------ > Flow-based real-time traffic analytics software. Cisco certified tool. > Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer > Customize your own dashboards, set traffic alerts and generate reports. > Network behavioral analysis & security monitoring. All-in-one tool. > http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk > _______________________________________________ > dotNetRDF-develop mailing list > dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-develop |