Stingray -- Schema-Based File Reader

HTML Documentation: http://stingrayreader.sourceforge.net/index.html

Admins:

Project Admins:

Download:


Spreadsheet format files are the lingua franca of data processing. CSV, Tab, XLS, XSLX and ODS files are used widely.

Python's csv module and the XLRD project (http://www.lexicon.net/sjmachin/xlrd.htm) help us handle spreadsheet files. The ZipFile and XML modules help us parse almost everything else. By themselves, however, thes modules aren't a very complete solution.

In particular, there's a lot of fumbling around trying to handle the schema for a spreadsheet.

The Stingray Schema-Based File Reader offers several features to help process files in spreadsheet formats.

  1. It wraps csv, xlrd, plus several other parsers into a single, unified "workbook" structure. Applications can work with any of the common physical formats in a completely uniform way.
  2. It extends the workbook to include fixed format files (with no delimiters) and even COBOL files in EBCDIC.
  3. It provides a uniform way to load and use schema information. This can be header rows in the individual sheets of a workbook, or it can be separate schema information. It can also involve complex header parsing for those spreadsheets where someone had to create fancy column titles that include merged cells and other complications.
  4. It provides a suite of data conversions that cover the most common cases.

Additionally, Stringray provides some guidance on how to structure file-processing applications so that they are testable and composable.

Stingray 4.3 requires Python 3.3.

It depends on one other projects to read legacy .xls files.

In order to do a complete build from scratch, this is a literate programming example. You'll need these two tools

Since Stingray is a Literate Programming project, the documentation is also the source. And vice-versa.