Hi,
The extraction framework generates Invalid N-Triple output files. It escapes the UTF-8 Characters in the literals but fails to do so for IRIs, which results in invalid .nt files for any IRIs that contain non-ASCII characters. This is fine for Virtuoso, which doesn't seem to check for the validity of the N-Triple files during the import, but other Triplestores such as Allegrograph and RDF frameworks and tools such as Raptor and Python RDFlib do rigorous checking and fail when non-Ascii characters are present in N-Triple files.
Looking trough the code, namely org.dbpedia.extraction.destinations.Quad I've seen that you're aware of the problem. Could you provide an estimate for a bug-fix or provide some tips on how I could patch it myself.
Kind Regards,
Alexandru
Ok, did it myself, but it doesn't help, most tools will still spew syntax errors. I solved the problem by changing the output to the Trix format which supports UTF8 natively.
Probably fixed. I''ll check.