Menu

#1 CSV files with complex values not parsed properly

open
nobody
None
5
2009-12-24
2009-12-24
No

I have a CSV file generated from a dump of CKAN data. Some of the fields in the dump contain newlines and/or commas and other syntax. XLWrap appears not to parse such fields correctly.

To reproduce, download the CKAN CSV dump from http://ckan.net/dump/, then apply the following transform:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix xl: <http://purl.org/NET/xlwrap#> .
@prefix ex: <http://example.com/vocab#>.

{ # default graph in TriG file
[] a xl:Mapping ;
xl:template [
xl:fileName "file:src/main/data/hmg.ckan.net-20091204.csv" ;
xl:templateGraph ex:ckan ;
xl:transform [
a xl:RowShift
; xl:breakCondition "ALLEMPTY(A2)"
]
] .
}

ex:ckan {
[ xl:uri "'http://ckan.org/package/rdf/' & URLENCODE(A2)"^^xl:Expr ]
; dc:title "B2"^^xl:Expr
}

The generated subjects do not all match the A* values from the CSV dump.

A workaround is to load the CSV into openoffice, which does parse it correctly, then re-export the spreadsheet as a .ods file.

Discussion


Log in to post a comment.

MongoDB Logo MongoDB