Cannot parse Excel 95 file

Status: Alpha

Brought to you by: rvk

#15 Cannot parse Excel 95 file

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2007-04-18

Created: 2007-04-18

Creator: Anonymous

Private: No

For the attached file: ND87A01_bad.xls

>>> import pyExcelerator
>>> pyExcelerator.parse_xls('ND87A01_bad.xls')
[]
>>>

NOTE: the file opens in gnumeric, openoffice 2.1, and Mac Excel, but openoffice 2.0.2 leaves the 'Data' sheet blank.

pyExcelerator-0.6.3a
Kubuntu Dapper 6.06
Python 2.4.3

Discussion

Nobody/Anonymous - 2007-04-18

Excel 95 file

ND87A01_bad.xls

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John Machin - 2007-05-20

Logged In: YES
user_id=480138
Originator: NO

The file is somewhat ugly; here's what xlrd (version 0.6.1a5) has to say:

command-prompt>\python25\python \python25\scripts\runxlrd.py ov *.xls

=== File: ND87A01_bad.xls ===
WARNING *** file size (12983) not 512 + multiple of sector size (512)
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
*** Open failed: <class 'xlrd.biffh.XLRDError'>: No CODEPAGE record, encoding_override not used: can't determine encoding

When an encoding override is used, it works:
command-prompt>\python25\python \python25\scripts\runxlrd.py -e ascii ov *.xls

=== File: ND87A01_bad.xls ===
WARNING *** file size (12983) not 512 + multiple of sector size (512)
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
Open took 0.02 seconds
BIFF version: 5; datemode: 0
codepage: None (encoding: ascii); countries: (0, 0)
last saved by: u''
nsheets: 3; sheet names: [u'Sheet1', u'Labels', u'Data']
Pickleable: 1; Use mmap: 1; Formatting: 0
FORMATs: 8, FONTs: 6, XFs: 21
Load time: 0.00 seconds (stage 1) 0.01 seconds (stage 2)
sheet 0: name = u'Sheet1'; nrows = 2; ncols = 2
sheet 1: name = u'Labels'; nrows = 16; ncols = 5
sheet 2: name = u'Data'; nrows = 28; ncols = 16

HTH with the pyExcelerator debugging ...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John Machin - 2007-06-02

Logged In: YES
user_id=480138
Originator: NO

The reason why it is returning an empty list is that there is a bug in CompDoc.py line 272:
if next_in_chain - last_chunk_finish <= 1:
should have == instead of <=
or should be changed to the more understandable
if next_in_chain == last_chunk_finish + 1:

The sector allocation table in this file is valid but rather ragged and the bug causes the getstream method to miss large chunks of data and parse_xls never sees a valid BOF (beginning of file) record.

Even when that is fixed, the OP will also need (because there is no CODEPAGE record) to do parse_xls(filename, encoding='ascii').

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.