Kopio - 2008-09-18

This is a brief introduction to XML-Spreadsheet's format, and is probably all that is required to be able to use XML-Spreadsheets fully.  The purpose of XML-Spreadsheet is to store as completely as possible the structure and content information of original spreadsheet file in XML format, so that the vast availability of XML tools, processors, utilities, systems, discussions and knowledge can be more readily applied to the content.

An skeleton example of XML-Spreadsheet would serve to illustrate clearly the overall structure.  XML-Spreadsheet takes the following format:

<Sheet>
  <Information>
    <Filename>UBL-ApplicationResponse-2.0.xls</Filename>
    <GeneratedBy>UBLish v2.0.0.kpo</GeneratedBy>
    <GeneratedOn>20080918 013020</GeneratedOn>
  </Information>
<Row rowIndex="1">
  <Column rowIndex="1" columnIndex="1" columnLabel="UBL Name">UBL Name</Column>
  <Column rowIndex="1" columnIndex="2" columnLabel="Dictionary Entry Name">Dictionary Entry Name</Column>
  ......
</Row>
<Row rowIndex="2"> ... </Row>
<Row rowIndex="3"> ... </Row>
  ... ...
<Row rowIndex="18"> ... </Row>
</Sheet>

No namespace is used.  This is just simply to void the need to discuss, create, maintain and transport namespace values and structures which are not immediately useful here.

The root node is <Sheet>.  It contains 1 leading <Information> element, followed by as many number of <Row> elements as there are used rows found in the original spreadsheet.  Each <Row> element, in turn, contains as many <Column> elements as there are used columns found in the original spreadsheet.

So far so good.  Once the number of used columns has been determined by UBLish at the beginning, UBLish will faithfully render the exact same number of <Column> elements in all <Row> elements.  This is to preserve and convey the matrix-like manner of spreadsheets.

Each <Row> element carries a rowIndex attribute.  This carries the row index integer that corresponds to the Row number found in the original spreadsheet.  This facilitates easy debugging, comparison and information exchanges.

Similarly, each <Column> element carries both a rowIndex and a columnIndex attribute.  The rowIndex attribute always contains the same value as the <Row> parent's rowIndex value, and serves to self-document each <Column> element even in a very long XML-Spreadsheet document.  The columnIndex attribute identifies the column number (starting from 1) in the original spreadsheet from which the content of <Column> element was extracted.  By looking up the rowIndex and columnIndex information, a user can easily cross-reference the value found with the original spreadsheet without the need for external utility, special viewers, processing programs, etc.

A special columnLabel attribute is found in each <Column> element.  This attribute's value is initially identified from the spreadsheet's "header" row, which is typically Row 1.  UBLish can be programmed to look at another row, or a combination of 2 rows to form a virtual "header" row that identifies in human-readable form the column header texts.  In any case, such a row of labels would then be extracted and replicated across each <Column> element by storing them into each columnLabel attribute.  So, for instance, if we see "UBL Name" as the row 1 value of the spreadsheet, then all <Column> elements' columnLabel would have value "UBL Name", as in:

    <Column rowIndex="..." columnIndex="1" columnLabel="UBL Name">......</Column>

This little extra redundancy is meant more for readability and convenience than a necessity.  When we look at, say, Row 952, for instance, it becomes extremely useful to see that the column label is right next to the value we are reading in that column and row without having to move all the way back to Row 1.

The hard limits are as follows.  If more than 5,000 rows are used, then only up to 5,000 rows are processed and generated.  It can happen that the spreadsheets can hint at a use of a vast number of columns (such as 700), when in visual fact only about 25 columns are used.  To limit wasted processing times, UBLish hard limits column counts to 52 (from A up to AZ) as we really seldom attempt to use more columns than that without finding it visually unweildy or without choosing instead to create two small spreadsheets.  Those knowing how to twiddle UBLish with a text editor can find these hard limits (by searching for 5000 and 52) and changing them if you desire.

Send me comments if you need more information.

Chin Chee-Kai