Menu

Home

Lochrann
Attachments
lochrann-dusma-beag.png (62713 bytes)

XML2CSV-Generic-Converter is a Java program (Java 1.6 or higher please) dedicated to XML to CSV conversion.

Its main goal is to perform intuitive, effective and automated conversion just like spreadsheets do (or should do) when they import native XML with special care for performance so that some workable output could still be produced even against XML monsters (100Mb or more).

I've been working on it since February 2014, basically on my spare time, and picked up a few XML files at random to test it until all known issues were fixed.

The executable Jar I've uploaded is a regular jar-with-dependencies Maven assembly.

The source code is available both unfolded in a local SVN project (in the Code section), and bundled in a spare file named XML2CSVGenericConverter_V1.0.0-src.jar (in the Files section).

The initial V1.0.0 upload was performed on April 1st 2014; since then I've committed several changes on V1.0.0 files but I didn't increment the version number because I saw the subsequent months as some kind of 'pre-prod' time dedicated to V1.0.0 fine-tuning.
Maven gurus would certainly have expected an intermediate V1.0.0-SNAPSHOT version followed by an actual V1.0.0 but I didn't want to have to rename files and break Internet search engines' indexes.

A definitive & comprehensive XML2CSV-Generic-Converter V1.0.0 has now been frozen with all the core features. Apart from documentation refinements the cumulative updates made since Spring 2014 concerned significant items - namely:

  • addition of surrounding double-quotes in CSV field contents when needed (typically when a field contains the CSV field separator);
  • full support of element attributes, not only leaf elements' attributes but also attributes of intermediate XML elements too, plus attribute support in filter files;
  • additional option to perform name space aware parsing of XML files;
  • improvement in extensive optimization variant 2 in order to efficiently pack one leaf element's attributes on the same line as the element content;
  • extensive optimization variant 3 added, which magnifies variant 2's packing capabilities;
  • unleashed optimization option added in order to reach utmost compact CSV generation at will (through control over the way the root tag is handled);
  • support for 'mixture' elements which contain both plain data and other elements (that is, which are both leaf and intermediate).

The actual XML to CSV conversion shipped with version V1.0.0 won't change from now on but the accompanying documentation might from time to time, in order to flesh out or update certain aspects.

Three late updates were performed in 2015. Changes made didn't affect XML to CSV conversion but fixed 3 unnoticed (?) minor bugs before I confidently move on to the next step:

  • slight discrepancy in data buffer displays in debug mode;
  • wrong minimum cardinality computed (1 instead of 0) during the XML template file analysis for a new XML element popping up in the very last occurrence of a repeated block (XML to CSV conversion remains unaffected because minimal cardinality isn't implied in the conversion process);
  • same issue with attributes.

A late additional bug fix was also performed on February, 29th 2016 because data handler resets between XML files were going a bit too far when several files are processed (resetting warding/unleashed/attribute extraction flags at the same time with odd side effects).

Version V1.1.0, if I ever get the chance to write it, will add useful auxiliary features - namely:

  1. the possibility to generate an XML schema after the XML template file analysis and/or the possibility to bypass the XML template file analysis phase if an XML schema is provided;
  2. for Java programmers, the possibility to convert data read from an InputStream[] (File[] only is supported at present).

In an even more remote future, fun, exotic (but most certainly less useful) features might be added too - for instance:

  1. the computation of a dependency factor during the XML template file analysis making it possible to have the program choose automatically the best conversion options (optimization flavor, regular or unleashed optimization, attribute extraction, and so on);
  2. a nice & cute graphical user interface.

Slán go fóill,
Lóchrann

Wednesday, September 21 2016

Project Members:


lochrann


Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.