From: <prn...@us...> - 2009-12-30 15:07:05
|
Revision: 6680 http://octave.svn.sourceforge.net/octave/?rev=6680&view=rev Author: prnienhuis Date: 2009-12-30 15:06:54 +0000 (Wed, 30 Dec 2009) Log Message: ----------- Revised and improved, now also based on odftoolkit & xercesImpl. Only reading from .ods supported, adding write support is not yet feasible Modified Paths: -------------- trunk/octave-forge/main/io/inst/calccelladdress.m trunk/octave-forge/main/io/inst/ods2oct.m trunk/octave-forge/main/io/inst/odsclose.m trunk/octave-forge/main/io/inst/odsopen.m trunk/octave-forge/main/io/inst/odsread.m trunk/octave-forge/main/io/inst/parsecell.m Modified: trunk/octave-forge/main/io/inst/calccelladdress.m =================================================================== --- trunk/octave-forge/main/io/inst/calccelladdress.m 2009-12-30 15:04:15 UTC (rev 6679) +++ trunk/octave-forge/main/io/inst/calccelladdress.m 2009-12-30 15:06:54 UTC (rev 6680) @@ -17,20 +17,22 @@ ## calccelladdress - compute spreadsheet style cell address from ## row & column index. ## -## Max column index currently set to 1024 (ODS) +## Max column index currently set to 1378 (ODS has max 1024) ## Author: Philip Nienhuis <prnienhuis at users.sf.net> ## Created: 2009-12-12 +## Last updated: 2009-12-27 function [ celladdress ] = calccelladdress (trow, lcol, row, column) + if (nargin < 4) error ("calccelladdress: not enough arguments.") endif colnr = lcol + column - 1; - if (colnr> 1024) error ("Column nr >1024"); endif + if (colnr> 1378) error ("Column nr > 1378"); endif str = char (rem ((colnr-1), 26) + 'A'); if (colnr > 26 && colnr < 703) tmp = char (floor ((colnr - 27) / 26) + 'A'); str = [tmp str]; - elseif (colnr > 702) + elseif (colnr > 702 && colnr < 1379) tmp = char (floor ((colnr - 703) / 26) + 'A'); str = ['A' tmp str]; endif Modified: trunk/octave-forge/main/io/inst/ods2oct.m =================================================================== --- trunk/octave-forge/main/io/inst/ods2oct.m 2009-12-30 15:04:15 UTC (rev 6679) +++ trunk/octave-forge/main/io/inst/ods2oct.m 2009-12-30 15:06:54 UTC (rev 6680) @@ -14,6 +14,380 @@ ## along with Octave; see the file COPYING. If not, see ## <http://www.gnu.org/licenses/>. +## -*- texinfo -*- +## @deftypefn {Function File} [ @var{rawarr}, @var{ods}, @var{rstatus} ] = ods2oct (@var{ods}) +## @deftypefnx {Function File} [ @var{rawarr}, @var{ods}, @var{rstatus} ] = ods2oct (@var{ods}, @var{wsh}) +## @deftypefnx {Function File} [ @var{rawarr}, @var{ods}, @var{rstatus} ] = ods2oct (@var{ods}, @var{wsh}, @var{range}) +## +## Read data contained within range @var{range} from worksheet @var{wsh} +## in an OpenOffice.org spreadsheet file pointed to in struct @var{ods}. +## +## ods2oct is a mere wrapper for interface-dependent scripts (e.g., +## ods2jotk2oct and ods2jod2oct) that do the actual reading. +## +## @var{wsh} is either numerical or text, in the latter case it is +## case-sensitive and it may be max. 31 characters long. +## Note that in case of a numerical @var{wsh} this number refers to the +## position in the worksheet stack, counted from the left in a Calc +## window. The default is numerical 1, i.e. the leftmost worksheet +## in the ODS file. +## +## @var{range} is expected to be a regular spreadsheet range format, +## or "" (empty string, indicating all data in a worksheet). +## +## If only the first argument is specified, ods2oct will try to read +## all contents from the first = leftmost (or the only) worksheet (as +## if a range of @'' (empty string) was specified). +## +## If only two arguments are specified, ods2oct assumes the second +## argument to be @var{wsh}. In that case ods2oct will try to read +## all data contained in that worksheet. +## +## Return argument @var{rawarr} contains the raw spreadsheet cell data. +## Optional return argument @var{ods} contains the pointer struct. Field +## @var{ods}.limits contains the outermost column and row numbers of the +## actually read cell range. +## @var{rstatus} will be set to 1 if the requested data have been read +## successfully, 0 otherwise. +## Use parsecell() to separate numeric and text values from @var{rawarr}. +## +## @var{ods} is supposed to have been created earlier by odsopen in the +## same octave session. It is only referred to, not changed. +## +## Erroneous data and empty cells turn up empty in @var{rawarr}. +## Date/time values in OpenOffice.org are returned as numerical values +## with base 1-1-000 (same as octave). But beware that Excel spreadsheets +## rewritten by OpenOffice.org into .ods format may have numerical date +## cells with base 01-01-1900 (same as MS-Excel). +## +## Be aware that ods2oct trims @var{rawarr} from empty outer rows & columns, +## so any returned cell array may turn out to be smaller than requested +## in @var{range}. +## +## When reading from merged cells, all array elements NOT corresponding +## to the leftmost or upper OpenOffice.org cell will be treated as if the +## "corresponding" cells are empty. +## +## Examples: +## +## @example +## A = ods2oct (ods1, '2nd_sheet', 'C3:ABS40000'); +## (which returns the numeric contents in range C3:ABS40000 in worksheet +## '2nd_sheet' from a spreadsheet file pointed to in pointer struct ods1, +## into numeric array A) +## @end example +## +## @example +## [An, ods2, status] = ods2oct (ods2, 'Third_sheet'); +## @end example +## +## @seealso odsopen, odsclose, parsecell, odsread, odsfinfo +## +## @end deftypefn + +## Author: Philip Nienhuis +## Created: 2009-12-13 +## Latest update: 2009-12-30 + +function [ rawarr, ods ] = ods2oct (ods, wsh=1, datrange=[]) + + if (strcmp (ods.xtype, 'OTK')) + # Read xls file tru Java & ODF toolkit + [rawarr, ods, rstatus] = ods2jotk2oct (ods, wsh, datrange); + + elseif (strcmp (ods.xtype, 'JOD')) + [rawarr, ods, rstatus] = ods2jod2oct (ods, wsh, datrange); + +# elseif ---- < Other interfaces here > + + else + error (sprintf ("ods2oct: unknown OpenOffice.org .ods interface - %s.", xls.xtype)); + + endif + +endfunction + +#===================================================================== + +## Copyright (C) 2009 Philip Nienhuis <prnienhuis _at- users.sf.net> +## +## This program is free software; you can redistribute it and/or modify +## it under the terms of the GNU General Public License as published by +## the Free Software Foundation; either version 2 of the License, or +## (at your option) any later version. +## +## This program is distributed in the hope that it will be useful, +## but WITHOUT ANY WARRANTY; without even the implied warranty of +## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +## GNU General Public License for more details. +## +## You should have received a copy of the GNU General Public License +## along with Octave; see the file COPYING. If not, see +## <http://www.gnu.org/licenses/>. + +## odf2jotk2oct - read ODS spreadsheet data using Java & odftoolkit +## You need proper java-for-octave & odfdom.jar + xercesImpl.jar +## in your javaclasspath. + +## Author: Philip Nenhuis <pr.nienhuis at users.sf.net> +## Created: 2009-12-24 +## Last update: 2009-12-30 + +function [ rawarr, ods, status ] = ods2jotk2oct (ods, wsh, crange = []) + + # Parts after user gfterry in + # http://www.oooforum.org/forum/viewtopic.phtml?t=69060 + + # Get contents and table stuff from the workbook + odfcont = ods.workbook; # Use a local copy just to be sure. octave + # makes physical copies only when needed (?) + xpath = ods.app.getXPath; + + # AFAICS ODS spreadsheets have the following hierarchy: + # <table:table> - table nodes, the actual worksheets; + # <table:table-row> - row nodes, the rows in a worksheet; + # <table:table-cell> - cell nodes, the cells in a row; + # Styles (formatting) are defined in a section "settings" outside the + # contents proper but are referenced in the nodes. + + # Create an instance of type NODESET for use in subsequent statement + NODESET = java_get ('javax.xml.xpath.XPathConstants', 'NODESET'); + # Parse sheets ("tables") from ODS file + sheets = xpath.evaluate ("//table:table", odfcont, NODESET); + nr_of_sheets = sheets.getLength (); + + # Check user input & find sheet pointer (1-based), using ugly hacks + if (~isnumeric (wsh)) + # Search in sheet names, match sheet name to sheet number + ii = 0; + while (++ii <= nr_of_sheets && ischar (wsh)) + # Look in first part of the sheet nodeset + tmp1 = char (sheets.item(ii-1))(1:150); + ist = index (tmp1, 'table:name=') + 12; + ien = index (tmp1(ist:end), '" table') - 2 + ist; + if (strcmp (tmp1(ist:ien), wsh)) + # Convert local copy of wsh into a number (pointer) + wsh = ii + i; + endif + endwhile + if (ischar (wsh)) error (sprintf ("No worksheet '%s' found in file %s", wsh, ods.filename)); endif + elseif (wsh > nr_of_sheets || wsh < 1) + # We already have a numeric sheet pointer. If it's not in range: + error (sprintf ("Worksheet no. %d out of range (1 - %d)", wsh, nr_of_sheets)); + endif + + # Get table-rows in sheet no. wsh. Sheet count = 1-based (!) + str = sprintf ("//table:table[%d]/table:table-row", wsh); + sh = xpath.evaluate (str, odfcont, NODESET); + nr_of_rows = sh.getLength(); + # Create storage for data content. We can't know max row length yet so expect the worst + rawarr = cell (nr_of_rows, 1024); + + # Either parse (given cell range) or prepare (unknown range) help variables + if (isempty (crange)) + trow = 1; # Top row + brow = nr_of_rows; # Bottom row (ODS max = 65535, Xpath's guess = better) + nrows = brow; # Number of rows to be read + lcol = 1; # Leftmost column of range + rcol = 1024; # Rightmost columns (1024 on ODS) + ncols = rcol; # Number of columns to be read + else + [dummy, nrows, ncols, trow, lcol] = parse_sp_range (crange); + brow = min (trow + nrows - 1, nr_of_rows); + # Check ODS column limits + if (lcol > 1024 || trow > 65536) error ("ods2oct: invalid range; max 1024 columns & 65536 rows."); endif + # Truncate range silently if needed + rcol = min (lcol + ncols - 1, 1024); + ncols = min (ncols, 1024 - lcol + 1); + nrows = min (nrows, 65536 - trow + 1); + endif + + # Read from worksheet row by row. Row numbers are 0-based + rightmcol = 0; # Used to find actual rightmost column + ii = trow - 1; # Spreadsheet row counter + rowcnt = ii; # xpath row counter (multiple rows may be condensed in one!) + while (++ii <= brow) + row = sh.item(rowcnt++); + nr_of_cells = min (row.getLength (), rcol); + rightmcol = max (rightmcol, nr_of_cells); # Keep track of max row length + + # Read column (cell, "table-cell" in ODS speak) by column + jj = lcol; r_cols = 0; + while (jj + r_cols <= 1024 && jj <= rcol) + tcell = char (row.getCellAt(jj-1)); + cellcont = getcellcont (tcell); # Parse cell contents, func = below + if (~isempty (cellcont.cvalue)) + + # Get data from cell + switch cellcont.ctype + case {'float', 'currency', 'percentage'} + rawarr(ii, jj) = str2double (cellcont.cvalue); + case 'date' + # Dates are returned as octave datenums, i.e. 0-0-0000 based + str = cellcont.cvalue; + yr = str2num (str(1:4)); + mo = str2num (str(6:7)); + dy = str2num (str(9:10)); + if (index (str, 'T')) + hh = str2num (str(12:13)); + mm = str2num (str(15:16)); + ss = str2num (str(18:19)); + rawarr(ii, jj) = datenum (yr, mo, dy, hh, mm, ss); + else + rawarr(ii, jj) = datenum (yr, mo, dy); + endif + case 'time' + str = cellcont.cvalue; + if (index (str, 'PT')) + hh = str2num (str(3:4)); + mm = str2num (str(6:7)); + ss = str2num (str(9:10)); + rawarr(ii, jj) = datenum (0, 0, 0, hh, mm, ss); + endif + case 'boolean' + if (strcmp (cellcont.cvalue, 'true')) + rawarr(ii, jj) = true; + else + rawarr(ii, jj) = false; + endif + case 'string' + rawarr(ii, jj) = cellcont.cvalue; + otherwise + # Nothing + endswitch + endif + ++jj; # Next cell + r_cols = cellcont.cols; # Number of repeated columns. + endwhile + + # Check for repeated rows (i.e. condensed in one table-row) + mm = index (char(row), 'number-rows-repeated'); + if (mm) + str = char (row) (1:82); + ist = mm + 22; ien = index (str(ist:end), '"') + ist - 2; + extrarows = str2num (str(ist:ien)); + if (extrarows > 0 && ii + extrarows < 65535) + # Expand table-row + nr_of_rows = nr_of_rows + extrarows - 1; + ii = ii + extrarows - 1; + nrows = min (65536, nrows + extrarows - 1); + brow = min (trow + nrows - 1, nr_of_rows); + # Increase return argument size if needed + tmp = cell (extrarows, 1024); + rawarr = [rawarr; tmp]; + endif + endif + + endwhile + + # Pre-crop rawarr from right (max was 1024) and bottom + rawarr = rawarr (1:nr_of_rows, 1:rightmcol); + + # Crop rawarr from all empty outer rows & columns just like Excel does + # & keep track of limits + emptr = cellfun('isempty', rawarr); + irowt = 1; + while (all (emptr(irowt, :))), irowt++; endwhile + irowb = nr_of_rows; + while (all (emptr(irowb, :))), irowb--; endwhile + icoll = 1; + while (all (emptr(:, icoll))), icoll++; endwhile + icolr = rightmcol; + while (all (emptr(:, icolr))), icolr--; endwhile + # Crop textarray + rawarr = rawarr(irowt:irowb, icoll:icolr); + status = 1; + + ods.limits = [lcol+icoll-1, lcol+icolr-1; trow+irowt-1, trow+irowb-1]; + +endfunction + + +## Copyright (C) 2009 by Philip Nienhuis <prnienhuis at users.sf.net> +## +## Parse some data from ODS spreadsheet cells. +## Input = character string <table:table-cell>...</table:table-cell> +## +## Author: Philip Nenhuis <pr.nienhuis at users.sf.net> +## Created: 2009-12-24 +## Last update: 2009-12-27 + +function cellcont = getcellcont (tcell) + + cellcont = struct ("ctype", '', "cvalue", '', "cols", 0); + + # Check for repeated columns (often empty columns, viz. to right of data) + ii = index (tcell, 'number-columns-repeated'); + if (ii) + ist = ii+25; ien = index(tcell(ist:end), '"') + ist - 2; + cellcont.cols = str2num (tcell(ist:ien)); + endif + + # Get cell value type: float / boolean / string / date/.... Skip errors. + ii = index (tcell, 'value-type='); + if (ii && ~index (tcell, 'text:p>Err:') && ~index (tcell, 'text:p>#DIV')) + # Then cell contains useful data. + ist = ii + 12; ien = index (tcell(ist:end), '"') + ist - 2; + cellcont.ctype = tcell(ist:ien); + + if (strcmp (cellcont.ctype, 'string')) + # Get string value from between <text:p|r> </text:p|r> tags + ii = index (tcell, '<text'); + if (ii) + ist = ii + 8; ien = index (tcell(ist:end), '</text') + ist - 2; + cellcont.cvalue = tcell(ist:ien); + endif + + elseif (strcmp (cellcont.ctype, 'date')) + # Get date string + ii = index (tcell, 'date-value='); + if (ii) + ist = ii + 12; ien = index (tcell(ist:end), '"') + ist - 2; + # Return string for the time being + cellcont.cvalue = tcell(ist:ien); + endif + + elseif (strcmp (cellcont.ctype, 'time')) + # Get time string + ii = index (tcell, 'time-value='); + if (ii) + ist = ii + 12; ien = index (tcell(ist:end), '"') + ist - 2; + # Return string for the time being (no pun intended) + cellcont.cvalue = tcell(ist:ien); + endif + + else + # Get cell value as string (float, currency, percentage) + ii = index (tcell, 'value='); + if (ii) + ist = ii + 7; ien = index (tcell(ist:end), '"') + ist - 2; + cellcont.cvalue = tcell(ist:ien); + endif + + endif + endif + +endfunction + + +#=========================================================================== + +## Copyright (C) 2009 Philip Nienhuis <pr.nienhuis at users.sf.net> +## +## This program is free software; you can redistribute it and/or modify +## it under the terms of the GNU General Public License as published by +## the Free Software Foundation; either version 2 of the License, or +## (at your option) any later version. +## +## This program is distributed in the hope that it will be useful, +## but WITHOUT ANY WARRANTY; without even the implied warranty of +## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +## GNU General Public License for more details. +## +## You should have received a copy of the GNU General Public License +## along with Octave; see the file COPYING. If not, see +## <http://www.gnu.org/licenses/>. + ## ods2oct - get data out of an ODS spreadsheet into octave. ## Watch out, no error checks, and spreadsheet formula error results ## are conveyed as 0 (zero). @@ -21,11 +395,22 @@ ## Author: Philip Nienhuis ## Created: 2009-12-13 -function [ rawarr, ods ] = ods2oct (ods, wsh, range) +function [ rawarr, ods, rstatus] = ods2jod2oct (ods, wsh, crange) + persistent months; + months = {"JAN", "FEB", "MAR", "APR", "MAY", "JUN", "JUL", "AUG", "SEP", "OCT", "NOV", "DEC"}; + + if (isempty (crange)) error ("Empty cell range not allowed w. jOpenDocument."); endif + + if (isnumeric(wsh)) wsh = max (wsh - 1, 1); endif # Sheet COUNT starts at 0! sh = ods.workbook.getSheet (wsh); - [dummy, nrows, ncols, toprow, lcol] = parse_sp_range (range); + [dummy, nrows, ncols, toprow, lcol] = parse_sp_range (crange); + if (lcol > 1024 || toprow > 65536) error ("ods2oct: invalid range; max 1024 columns & 65536 rows."); endif + # Truncate range silently if needed + rcol = min (lcol + ncols - 1, 1024); + ncols = min (ncols, 1024 - lcol + 1); + nrows = min (nrows, 65536 - toprow + 1); # Placeholder for data rawarr = cell (nrows, ncols); @@ -41,16 +426,28 @@ if (~isempty (val)) if (ischar (val)) # Text string - rawarr (ii, jj) = val; + rawarr(ii, jj) = val; elseif (isnumeric (val)) # Boolean - if (val) rawarr (ii, jj) = true; else; rawarr (ii, jj) = false; endif + if (val) rawarr(ii, jj) = true; else; rawarr(ii, jj) = false; endif else try val = sh.getCellAt (celladdress).getValue ().doubleValue (); - rawarr (ii, jj) = val; + rawarr(ii, jj) = val; catch - # Probably empty Cell + val = char (val); + if (isempty (val)) + # Probably empty Cell + else + # Maybe date / time value. Dirty hack to get values: + mo = strmatch (toupper (val(5:7)), months); + dd = str2num (val(9:10)); + yy = str2num (val(25:end)); + hh = str2num (val(12:13)); + mm = str2num (val(15:16)); + ss = str2num (val(18:19)); + rawarr(ii, jj) = datenum (yy, mo, dd, hh, mm,ss); + endif end_try_catch endif endif @@ -58,5 +455,7 @@ endfor ods.limits = [ lcol, lcol+ncols-1; toprow, toprow+nrows-1 ]; + + rstatus = ~isempty (rawarr); endfunction Modified: trunk/octave-forge/main/io/inst/odsclose.m =================================================================== --- trunk/octave-forge/main/io/inst/odsclose.m 2009-12-30 15:04:15 UTC (rev 6679) +++ trunk/octave-forge/main/io/inst/odsclose.m 2009-12-30 15:06:54 UTC (rev 6680) @@ -14,19 +14,19 @@ ## along with Octave; see the file COPYING. If not, see ## <http://www.gnu.org/licenses/>. -## odsclose - close an ods spreadsheet file & if needed write out to disk -## Basic version - no checks yet +## odsclose - close an ods spreadsheet file +## usage: ods = odsclose (ods) ## Author: Philip Nienhuis ## Created: 2009-12-13 +## Last update: 2009-12-27 function [ ods ] = odsclose (ods) - if (ods.changed > 0) - file = java_new ('java.io.File', ods.filename); - ods.workbook.saveAs (file); - endif + # If needed warn that dangling spreadsheet pointers may be left + if (nargout < 1) warning ("return argument missing - ods invocation not reset."); endif + # Until proper ODS write support has been implemented, not much need be done ods = []; endfunction Modified: trunk/octave-forge/main/io/inst/odsopen.m =================================================================== --- trunk/octave-forge/main/io/inst/odsopen.m 2009-12-30 15:04:15 UTC (rev 6679) +++ trunk/octave-forge/main/io/inst/odsopen.m 2009-12-30 15:06:54 UTC (rev 6680) @@ -14,17 +14,252 @@ ## along with Octave; see the file COPYING. If not, see ## <http://www.gnu.org/licenses/>. -## odsopen - open ODS spreadsheet -## Basic version, no checks yet +## -*- texinfo -*- +## @deftypefn {Function File} @var{ods} = odsopen (@var{filename}) +## @deftypefnx {Function File} @var{ods} = odsopen (@var{filename}, @var{readwrite}) +## @deftypefnx {Function File} @var{ods} = odsopen (@var{filename}, @var{readwrite}, @var{reqintf}) +## Get a pointer to an OpenOffice.org spreadsheet in the form of return +## argument @var{ods}. +## +## Calling odsopen without specifying a return argument is fairly useless! +## +## To make this function work at all, you need the Java package > 1.2.5 plus +## either ODFtoolkit > 3.5 & xercesImpl, or jOpenDocument installed on your +## computer + proper javaclasspath set. These interfaces are referred to as +## OTK and JOD, resp., and are preferred in that order by default (depending +## on their presence). +## +## @var{filename} must be a valid .ods OpenOffice.org file name. If @var{filename} +## does not contain any directory path, the file is saved in the current +## directory. +## +## @var{readwrite} is currently ignored until ODS writesupport is implemented. +## +## Optional input argument @var{reqintf} can be used to override the ODS +## interface automatically selected by odsopen. Currently implemented interfaces +## are 'OTK' (Java / ODFtoolkit) or 'JOD' (Java / jOpenDocument). +## +## Examples: +## +## @example +## ods = odsopen ('test1.ods'); +## (get a pointer for reading from spreadsheet test1.ods) +## +## ods = odsopen ('test2.ods', [], 'JOD'); +## (as above, indicate test2.ods will be read from; in this case using +## the jOpenDocument interface is requested) +## @end example +## +## @seealso odsclose, odsread, ods2oct, odsfinfo +## +## @end deftypefn ## Author: Philip Nienhuis ## Created: 2009-12-13 +## Last update: 2009-12-30 -function [ ods ] = odsopen (filename, rw=0) +function [ ods ] = odsopen (filename, rw=0, reqinterface = []) - file = java_new ('java.io.File', filename); - wb = java_invoke ('org.jopendocument.dom.spreadsheet.SpreadSheet', 'createFromFile', file); + persistent odsinterfaces; persistent chkintf; + if (isempty (chkintf)) + odsinterfaces = struct ( "OTK", [], "JOD", [] ); + chkintf = 1; + endif + + if (nargout < 1) usage ("ODS = odsopen (ODSfile, [Rw]). But no return argument specified!"); endif - ods = struct ("xtype", 'JOD', "app", file, "filename", filename, "workbook", wb, "changed", 0, "limits", []); + if (~isempty (reqinterface)) + # Try to invoke requested interface for this call. Check if it + # is supported anyway by emptying the corresponding var. + if (strcmp (tolower (reqinterface), tolower ('OTK'))) + printf ("Java/ODFtoolkit interface requested... "); + odsinterfaces.OTK = []; odsinterfaces.JOD = 0; + elseif (strcmp (tolower (reqinterface), tolower ('JOD'))) + printf ("Java/jOpenDocument interface requested... "); + odsinterfaces.OTK = 0; odsinterfaces.JOD = []; + else + usage (sprintf ("Unknown .ods interface \"%s\" requested. Only OTK or JOD supported", reqinterface)); + endif + odsinterfaces = getodsinterfaces (odsinterfaces); + # Well, is the requested interface supported on the system? + if (~odsinterfaces.(toupper (reqinterface))) + # No it aint + error (" ...but that's not supported!"); + endif + endif + + # Var xwrite is really used to avoid creating files when wanting to read, or + # not finding not-yet-existing files when wanting to write. + +# if (xwrite) xwrite = 1; endif # Be sure it's either 0 or 1 initially + # Check if ODS file exists + # Write statements temporarily disabled! + fid = fopen (filename, 'rb'); + if (fid < 0) +# if (~xwrite) + err_str = sprintf ("File %s not found\n", filename); + error (err_str) +# else +# printf ("Creating file %s\n", filename); +# xwrite = 2; +# endif + else + # close file anyway to avoid Java errors + fclose (fid); + endif + +# Check for the various ODS interfaces. No problem if they've already +# been checked, getodsinterfaces (far below) just returns immediately then. + + odsinterfaces = getodsinterfaces (odsinterfaces); + +# Supported interfaces determined; now check ODS file type. + + chk1 = strcmp (tolower (filename(end-3:end)), '.ods'); + if (~chk1) + error ("Currently ods2oct can only read reliably from .ods files") + endif + + ods = struct ("xtype", [], "app", [], "filename", [], "workbook", [], "changed", 0, "limits", []); + + if (odsinterfaces.OTK) + # Parts after user gfterry in + # http://www.oooforum.org/forum/viewtopic.phtml?t=69060 + wb = java_invoke ('org.odftoolkit.odfdom.doc.OdfDocument', 'loadDocument', filename); + ods.xtype = 'OTK'; + ods.app = wb; + ods.filename = filename; + ods.workbook = wb.getContentDom(); # Reads the entire spreadsheet + + elseif (odsinterfaces.JOD) + file = java_new ('java.io.File', filename); + wb = java_invoke ('org.jopendocument.dom.spreadsheet.SpreadSheet', 'createFromFile', file); + ods.xtype = 'JOD'; + ods.app = 'file'; + ods.filename = filename; + ods.workbook = wb; + +# elseif +# <other interfaces here> + + else + warning ("No support for OpenOffice.org .ods I/O"); + + endif + + if (~isempty (reqinterface)) + # Reset found interfaces for re-testing in the next call. Add interfaces if needed. + chkintf = []; + endif + endfunction + + +## Copyright (C) 2009 Philip Nienhuis <prnienhuis at users.sf.net> +## +## This program is free software; you can redistribute it and/or modify +## it under the terms of the GNU General Public License as published by +## the Free Software Foundation; either version 2 of the License, or +## (at your option) any later version. +## +## This program is distributed in the hope that it will be useful, +## but WITHOUT ANY WARRANTY; without even the implied warranty of +## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +## GNU General Public License for more details. +## +## You should have received a copy of the GNU General Public License +## along with Octave; see the file COPYING. If not, see +## <http://www.gnu.org/licenses/>. + +## -*- texinfo -*- +## @deftypefn {Function File} @var{odsinterfaces} = getodsinterfaces (@var{odsinterfaces}) +## Get supported OpenOffice.org .ods file read/write interfaces from +## the system. +## Each interface for which the corresponding field is set to empty +## will be checked. So by manipulating the fields of input argument +## @var{odsinterfaces} it is possible to specify which +## interface(s) should be checked. +## +## Currently implemented interfaces comprise: +## - Java & ODFtoolkit (www.apache.org) +## - Java & jOpenDocument (www.jopendocument.org) +## +## Examples: +## +## @example +## odsinterfaces = getodsinterfaces (odsinterfaces); +## @end example + +## Author: Philip Nienhuis +## Created: 2009-12-27 +## Last updated 2009-12-27 + +function [odsinterfaces] = getodsinterfaces (odsinterfaces) + + if (isempty (odsinterfaces.OTK) && isempty (odsinterfaces.JOD)) + chk1 = 1; + printf ("Supported interfaces: "); + else + chk1= 0; + endif + + # Try Java & ODF toolkit + if (isempty (odsinterfaces.OTK)) + odsinterfaces.OTK = 0; + try + tmp1 = javaclasspath; + # If we get here, at least Java works. Now check for proper entries + # in class path. Under *nix the classpath must first be split up + if (isunix) tmp1 = strsplit (char(tmp1), ":"); endif + jpchk = 0; entries = {"rt.jar", "odfdom.jar", "xercesImpl.jar"}; + for ii=1:size (tmp1, 2) + tmp2 = strsplit (char (tmp1(1, ii)), "\\/"); + for jj=1:size (entries, 2) + if (strmatch (entries{1, jj}, tmp2{size (tmp2, 2)})), ++jpchk; endif + endfor + endfor + if (jpchk > 2) + odsinterfaces.OTK = 1; + printf (" Java/ODFtoolkit (OTK) OK. "); + chk1 = 1; + else + warning ("\n Java support OK but not all required classes (.jar) in classpath"); + endif + catch + # ODFtoolkit support nonexistent + end_try_catch + endif + + # Try Java & jOpenDocument + if (isempty (odsinterfaces.JOD)) + odsinterfaces.JOD = 0; + try + tmp1 = javaclasspath; + # If we get here, at least Java works. Now check for proper entries + # in class path. Under unix the classpath must first be split up + if (isunix) tmp1 = strsplit (char(tmp1), ":"); endif + jpchk = 0; entries = {"rt.jar", "jOpenDocument"}; + for ii=1:size (tmp1, 2) + tmp2 = strsplit (char (tmp1(1, ii)), "\\/"); + for jj=1:size (entries, 2) + if (strmatch (entries{1, jj}, tmp2{size (tmp2, 2)})), ++jpchk; endif + endfor + endfor + if (jpchk > 1) + odsinterfaces.JOD = 1; + printf (" Java/jOpenDocument (JOD) OK. "); + chk1 = 1; + else + warning ("\nJava support OK but required classes (.jar) not all in classpath"); + endif + catch + # No jOpenDocument support + end_try_catch + endif + + # ---- Other interfaces here, similar to the ones above + + if (chk1) printf ("\n"); endif + +endfunction \ No newline at end of file Modified: trunk/octave-forge/main/io/inst/odsread.m =================================================================== --- trunk/octave-forge/main/io/inst/odsread.m 2009-12-30 15:04:15 UTC (rev 6679) +++ trunk/octave-forge/main/io/inst/odsread.m 2009-12-30 15:06:54 UTC (rev 6680) @@ -15,47 +15,104 @@ ## <http://www.gnu.org/licenses/>. ## -*- texinfo -*- -## @deftypefn {Function File} @var{rawarr} = odsread (@var{filename}, @var{wsh}, @var{range}) -## . +## @deftypefn {Function File} [@var{numarr}, @var{txtarr}, @var{rawarr}, @var{limits}] = odsread (@var{filename}) +## @deftypefnx {Function File} [@var{numarr}, @var{txtarr}, @var{rawarr}, @var{limits}] = odsread (@var{filename}, @var{wsh}) +## @deftypefnx {Function File} [@var{numarr}, @var{txtarr}, @var{rawarr}, @var{limits}] = odsread (@var{filename}, @var{wsh}, @var{range}) +## @deftypefnx {Function File} [@var{numarr}, @var{txtarr}, @var{rawarr}, @var{limits}] = odsread (@var{filename}, @var{wsh}, @var{range}, @var{reqintf}) ## -## Proof-of-concept function for reading data from an ODS spreadsheet. -## It works but there are no error checks at all; erroneous function -## results and empty cells are not ignored but returned as 0. +## Read data contained in range @var{range} from worksheet @var{wsh} +## in OpenOffice.org Calc spreadsheet file @var{filename}. ## -## You need to have jOpenDocument-1.2b2.jar in your javaclasspath (get it at -## http://www.jopendocument.org/) and the octave-forge java package installed. +## You need the octave-forge java package (> 1.2.5) and one or both of +## jopendocument.jar or preferrably: (odfdom.jar & xercesImpl.jar) in +## your javaclasspath. ## -## @var{filename} must be a valid ODS spreadsheet file name. If @var{filename} -## does not contain any directory path, the file is saved in the current -## directory. +## Return argument @var{numarr} contains the numeric data, optional +## return arguments @var{txtarr} and @var{rawarr} contain text strings +## and the raw spreadsheet cell data, respectively, and @var{limits} is +##a struct containing the data origins of the various returned arrays. ## -## @var{wsh} is the name or index number of a worksheet in the spreadsheet file. +## If @var{filename} does not contain any directory, the file is +## assumed to be in the current directory. ## -## @var{range} must be a valid spreadsheet range and cannot be empty. +## @var{wsh} is either numerical or text, in the latter case it is +## case-sensitive and it should conformtoOpenOffice.org Calc sheet +## name requirements. +## Note that in case of a numerical @var{wsh} this number refers to the +## position in the worksheet stack, counted from the left in a Calc +## window. The default is numerical 1, i.e. the leftmost worksheet +## in the Calc file. ## -## Return args @var{numarr} and @var{txtarr} contain numeric & text data, -## resp.; @var{rawarr} contains the raw data and lim the actual cell ranges -## from where the data originated. +## @var{range} is expected to be a regular spreadsheet range format, +## or "" (empty string, indicating all data in a worksheet). ## -## Example: +## If only the first argument is specified, odsread will try to read +## all contents from the first = leftmost (or the only) worksheet (as +## if a range of @'' (empty string) was specified). +## +## If only two arguments are specified, odsread assumes the second +## argument to be @var{wsh} and to refer to a worksheet. In that case +## odsread tries to read all data contained in that worksheet. ## +## The optional last argument @var{reqintf} can be used to override +## the automatic selection by odsread of one interface out of the +## supported ones: Java/ODFtoolkit or Java/jOpenDocument. +## +## Erroneous data and empty cells are set to NaN in @var{numarr} and +## turn up empty in @var{txtarr} and @var{rawarr}. Date/time values +## in date/time formatted cells are returned as numerical values in +## @var{obj} with base 1-1-000. Note that OpenOfice.org and MS-Excel +## have different date base values (1/1/0000 & 1/1/1900, resp.) and +## internal representation so MS-Excel spreadsheets rewritten into +## .ods format by OpenOffice.org Calc may have different date base +## values. +## +## @var{numarr} and @var{txtarr} are trimmed from empty outer rows +## and columns, so any returned array may turn out to be smaller than +## requested in @var{range}. +## +## When reading from merged cells, all array elements NOT corresponding +## to the leftmost or upper Calc cell will be treated as if the +## "corresponding" Calc cells are empty. +## +## odsread is just a wrapper for a collection of scripts that find out +## the interface to be used and do the actual reading. For each call +## to odsread the interface must be started and the Calc file read into +## memory. When reading multiple ranges (in optionally multiple worksheets) +## a significant speed boost can be obtained by invoking those scripts +## directly (odsopen / ods2oct [/ parsecell] / ... / odsclose). +## +## Examples: +## ## @example -## [ n, t, r, l ] = odsread ('test1.ods', '2ndSh', 'A1:AF250'); +## A = odsread ('test4.ods', '2nd_sheet', 'C3:AB40'); +## (which returns the numeric contents in range C3:AB40 in worksheet +## '2nd_sheet' from file test4.ods into numeric array A) ## @end example ## +## @example +## [An, Tn, Ra, limits] = odsread ('Sales2009.ods', 'Third_sheet'); +## (which returns all data in worksheet 'Third_sheet' in file test4.ods +## into array An, the text data into array Tn, the raw cell data into +## cell array Ra and the ranges from where the actual data came in limits) +## @end example +## +## @seealso odsopen, ods2oct, odsclose, odsfinfo, parsecell +## ## @end deftypefn ## Author: Philip Nienhuis <prnienhuis at users.sf.net> ## Created: 2009-12-12 +## Last update: 2009-12-29 -function [ numarr, txtarr, rawarr, lim ] = odsread (filename, wsh, range) +function [ numarr, txtarr, rawarr, lim ] = odsread (filename, wsh=1, datrange=[], reqintf=[]) - ods = odsopen (filename, 0); + ods = odsopen (filename, 0, reqintf); - [rawarr, ods] = ods2oct (ods, wsh, range); + [rawarr, ods] = ods2oct (ods, wsh, datrange); [numarr, txtarr, lim] = parsecell (rawarr, ods.limits); - odsclose (ods); + ods = odsclose (ods); endfunction Modified: trunk/octave-forge/main/io/inst/parsecell.m =================================================================== --- trunk/octave-forge/main/io/inst/parsecell.m 2009-12-30 15:04:15 UTC (rev 6679) +++ trunk/octave-forge/main/io/inst/parsecell.m 2009-12-30 15:06:54 UTC (rev 6680) @@ -16,16 +16,25 @@ ## -*- texinfo -*- ## @deftypefn {Function File} [ @var{numarr}, @var{txtarr}, @var{lim} ] = parsecell (@var{rawarr}) +## @deftypefnx {Function File} [ @var{numarr}, @var{txtarr}, @var{lim} ] = parsecell (@var{rawarr}, @var{limits}) ## ## Divide a heterogeneous 2D cell array into a 2D numeric array and a ## 2D cell array containing only strings. Both returned arrays are ## trimmed from empty outer rows and columns. +## This function is particularly useful for parsing cell arrays returned +## by functions reading spreadsheets (e.g., xlsread, odsread). ## ## Optional return argument @var{lim} contains two field with the outer ## column and row numbers of @var{numarr} and @var{txtarr} in the -## original @var{rawarr}. +## original array @var{rawarr}. +## If optional input argument @var{limits} contained the spreadsheet +## data limits returned in the spreadsheet file pointer struct +## (field xls.limits or ods.limits), optional return argument @var{lim} +## contains the real spreadsheet row & column numbers enclosing the +## origins of the numerical and text data returned in @var{numarr} +## and @var{txtarr}. ## -## Example: +## Examples: ## ## @example ## [An, Tn] = parsecell (Rn); @@ -33,12 +42,19 @@ ## text data into array Tn) ## @end example ## -## @seealso xlsread, xls2oct +## @example +## [An, Tn, lims] = parsecell (Rn, xls.limits); +## (which returns the numeric contents of Rn into array An and the +## text data into array Tn.) +## @end example ## +## @seealso xlsread, odsread, xls2oct, ods2oct +## ## @end deftypefn ## Author: Philip Nienhuis ## Created: 2009-12-13 +## Last updated: 2009-12-29 function [ numarr, txtarr, lim ] = parsecell (rawarr, rawlimits=[]) This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |