See http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt
external_links_en.nt.bz2 is the majority of the IRI errors.
This report includes IRI warnings as well (things that are not recommended practice in IRIs)
This report includes bad lexical form errors for (example) dates like 31st February
DBPedia 3.5 parse report
Not fixed (completely) in DBpedia 3.5.1:
$ bin/tdbloader --loc=/tmp/dataset /tmp/infobox_properties_en.nt
[...]
com.hp.hpl.jena.shared.JenaException: com.hp.hpl.jena.riot.RiotException:
[line: 573784, col: 87] Bad IRI: http:www.co.coos.or.us :
<http:www.co.coos.or.us> Code: 57/REQUIRED_COMPONENT_MISSING in HOST: A
component that is required by the scheme is missing.
the framework is also escaping URIs in object position from now on (version 3.7)
Re-opened because I'd like to look into it again. There are different problems on different levels. Escaping URIs (which escaping? URI encoding? N-Triples encoding?) is just one of them. The main problem is probably Wikipedia data quality.