Menu

#30 DBPedia 3.5 N-Triples parse report

Dump extraction
open-accepted
Bug (92)
5
2012-03-17
2010-04-15
No

See http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt

external_links_en.nt.bz2 is the majority of the IRI errors.

This report includes IRI warnings as well (things that are not recommended practice in IRIs)

This report includes bad lexical form errors for (example) dates like 31st February

Discussion

  • Andy Seaborne

    Andy Seaborne - 2010-04-15

    DBPedia 3.5 parse report

     
  • M.Kiesel

    M.Kiesel - 2010-05-01

    Not fixed (completely) in DBpedia 3.5.1:
    $ bin/tdbloader --loc=/tmp/dataset /tmp/infobox_properties_en.nt
    [...]
    com.hp.hpl.jena.shared.JenaException: com.hp.hpl.jena.riot.RiotException:
    [line: 573784, col: 87] Bad IRI: http:www.co.coos.or.us :
    <http:www.co.coos.or.us> Code: 57/REQUIRED_COMPONENT_MISSING in HOST: A
    component that is required by the scheme is missing.

     
  • Max Jakob

    Max Jakob - 2011-06-09

    the framework is also escaping URIs in object position from now on (version 3.7)

     
  • Max Jakob

    Max Jakob - 2011-06-09
    • status: open --> closed-fixed
     
  • Christopher Sahnwaldt

    • milestone: --> Dump extraction
    • assigned_to: nobody --> jcsahnwaldt
    • labels: --> Bug
    • status: closed-fixed --> open-accepted
     
  • Christopher Sahnwaldt

    Re-opened because I'd like to look into it again. There are different problems on different levels. Escaping URIs (which escaping? URI encoding? N-Triples encoding?) is just one of them. The main problem is probably Wikipedia data quality.

     
MongoDB Logo MongoDB