I would like to use Andrew Welch's CSV to XML stylesheet for a project designed for libraries.  Some libraries are receiving metadata for ebooks in ONIX for bulk purchases.  However, many, perhaps most are supplying the metadata in spreadsheets.

I use Stylus Studio Enterprise 2010 for most of my work but the Andrew's stylesheet would not work although I tried adding the standalone attribute to the output element and supplied the path to the source file in the param in the stylesheet.

The transformation could not get beyond the initial look at the source file; error message something to the effect, something not allowed in the prolog.

So, downloaded Kernow using Java Web Start option.  Pretty excited when I got an XML output but discovered that the file rearranged output so that the root element and its close were jumbled within the file rendering it corrupted beyond reasonable repair.

Looking again at the page, http://andrewjwelch.com/code/xslt/csv/csv-to-xml_v2.html 
I took note of the sentence:

"If this transform fails for a valid CSV file, let me know!"

So, I examined the source CSV file and saw cell content that could be the source of the problems.  For example, quotations characters of reviews of an ebook within a cell that also included content not within the quotation characters.  E.g., quote within quotation characters followed by a punctuation character (not ascii) and source of quote, eg, publication name, maybe an author.

Not looking for a solution but better understanding of what valid CSV is.  Impressed with Kernow, very nice, easy to use application.  It is easy for me to use a Stylus Studio utility to do the transformation but had hoped that I could create a work flow for libraries that would not require it.  A library application freely available to libraries offers a very useful XSLT tool and includes latest version of Saxon along with some default processor and .NET options.

In particular, I would like to know if valid CSV includes the notion of regular cell size or length perhaps.  More generally, what approaches might one take if CSV source is as untidy as I expect would be received from most publishers?  Is this truly a dead end for me?


Dana Pearson