Cannot parse Excel 95 file
Status: Alpha
Brought to you by:
rvk
For the attached file: ND87A01_bad.xls
>>> import pyExcelerator
>>> pyExcelerator.parse_xls('ND87A01_bad.xls')
[]
>>>
NOTE: the file opens in gnumeric, openoffice 2.1, and Mac Excel, but openoffice 2.0.2 leaves the 'Data' sheet blank.
pyExcelerator-0.6.3a
Kubuntu Dapper 6.06
Python 2.4.3
Excel 95 file
Logged In: YES
user_id=480138
Originator: NO
The file is somewhat ugly; here's what xlrd (version 0.6.1a5) has to say:
command-prompt>\python25\python \python25\scripts\runxlrd.py ov *.xls
=== File: ND87A01_bad.xls ===
WARNING *** file size (12983) not 512 + multiple of sector size (512)
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
*** Open failed: <class 'xlrd.biffh.XLRDError'>: No CODEPAGE record, encoding_override not used: can't determine encoding
When an encoding override is used, it works:
command-prompt>\python25\python \python25\scripts\runxlrd.py -e ascii ov *.xls
=== File: ND87A01_bad.xls ===
WARNING *** file size (12983) not 512 + multiple of sector size (512)
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
Open took 0.02 seconds
BIFF version: 5; datemode: 0
codepage: None (encoding: ascii); countries: (0, 0)
last saved by: u''
nsheets: 3; sheet names: [u'Sheet1', u'Labels', u'Data']
Pickleable: 1; Use mmap: 1; Formatting: 0
FORMATs: 8, FONTs: 6, XFs: 21
Load time: 0.00 seconds (stage 1) 0.01 seconds (stage 2)
sheet 0: name = u'Sheet1'; nrows = 2; ncols = 2
sheet 1: name = u'Labels'; nrows = 16; ncols = 5
sheet 2: name = u'Data'; nrows = 28; ncols = 16
HTH with the pyExcelerator debugging ...
Logged In: YES
user_id=480138
Originator: NO
The reason why it is returning an empty list is that there is a bug in CompDoc.py line 272:
if next_in_chain - last_chunk_finish <= 1:
should have == instead of <=
or should be changed to the more understandable
if next_in_chain == last_chunk_finish + 1:
The sector allocation table in this file is valid but rather ragged and the bug causes the getstream method to miss large chunks of data and parse_xls never sees a valid BOF (beginning of file) record.
Even when that is fixed, the OP will also need (because there is no CODEPAGE record) to do parse_xls(filename, encoding='ascii').