Menu

#70 Invalid N-Triple Output

Serialization
open-accepted
Bug (92)
5
2012-03-18
2011-07-04
Alex
No

Hi,

The extraction framework generates Invalid N-Triple output files. It escapes the UTF-8 Characters in the literals but fails to do so for IRIs, which results in invalid .nt files for any IRIs that contain non-ASCII characters. This is fine for Virtuoso, which doesn't seem to check for the validity of the N-Triple files during the import, but other Triplestores such as Allegrograph and RDF frameworks and tools such as Raptor and Python RDFlib do rigorous checking and fail when non-Ascii characters are present in N-Triple files.

Looking trough the code, namely org.dbpedia.extraction.destinations.Quad I've seen that you're aware of the problem. Could you provide an estimate for a bug-fix or provide some tips on how I could patch it myself.

Kind Regards,
Alexandru

Discussion

  • Alex

    Alex - 2011-07-06

    Ok, did it myself, but it doesn't help, most tools will still spew syntax errors. I solved the problem by changing the output to the Trix format which supports UTF8 natively.

     
  • Alex

    Alex - 2011-07-06
    • status: open --> closed
     
  • Christopher Sahnwaldt

    • milestone: --> 2702415
    • assigned_to: nobody --> jcsahnwaldt
    • labels: 973128 --> Bug
    • status: closed --> open-accepted
     
  • Christopher Sahnwaldt

    Probably fixed. I''ll check.

     
  • Christopher Sahnwaldt

    • milestone: 2702415 --> Serialization
     
Auth0 Logo