Menu

#15 Cannot parse Excel 95 file

open
nobody
None
5
2007-04-18
2007-04-18
Anonymous
No

For the attached file: ND87A01_bad.xls

>>> import pyExcelerator
>>> pyExcelerator.parse_xls('ND87A01_bad.xls')
[]
>>>

NOTE: the file opens in gnumeric, openoffice 2.1, and Mac Excel, but openoffice 2.0.2 leaves the 'Data' sheet blank.

pyExcelerator-0.6.3a
Kubuntu Dapper 6.06
Python 2.4.3

Discussion

  • Nobody/Anonymous

    Excel 95 file

     
  • John Machin

    John Machin - 2007-05-20

    Logged In: YES
    user_id=480138
    Originator: NO

    The file is somewhat ugly; here's what xlrd (version 0.6.1a5) has to say:

    command-prompt>\python25\python \python25\scripts\runxlrd.py ov *.xls

    === File: ND87A01_bad.xls ===
    WARNING *** file size (12983) not 512 + multiple of sector size (512)
    WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
    *** Open failed: <class 'xlrd.biffh.XLRDError'>: No CODEPAGE record, encoding_override not used: can't determine encoding

    When an encoding override is used, it works:
    command-prompt>\python25\python \python25\scripts\runxlrd.py -e ascii ov *.xls

    === File: ND87A01_bad.xls ===
    WARNING *** file size (12983) not 512 + multiple of sector size (512)
    WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
    Open took 0.02 seconds
    BIFF version: 5; datemode: 0
    codepage: None (encoding: ascii); countries: (0, 0)
    last saved by: u''
    nsheets: 3; sheet names: [u'Sheet1', u'Labels', u'Data']
    Pickleable: 1; Use mmap: 1; Formatting: 0
    FORMATs: 8, FONTs: 6, XFs: 21
    Load time: 0.00 seconds (stage 1) 0.01 seconds (stage 2)
    sheet 0: name = u'Sheet1'; nrows = 2; ncols = 2
    sheet 1: name = u'Labels'; nrows = 16; ncols = 5
    sheet 2: name = u'Data'; nrows = 28; ncols = 16

    HTH with the pyExcelerator debugging ...

     
  • John Machin

    John Machin - 2007-06-02

    Logged In: YES
    user_id=480138
    Originator: NO

    The reason why it is returning an empty list is that there is a bug in CompDoc.py line 272:
    if next_in_chain - last_chunk_finish <= 1:
    should have == instead of <=
    or should be changed to the more understandable
    if next_in_chain == last_chunk_finish + 1:

    The sector allocation table in this file is valid but rather ragged and the bug causes the getstream method to miss large chunks of data and parse_xls never sees a valid BOF (beginning of file) record.

    Even when that is fixed, the OP will also need (because there is no CODEPAGE record) to do parse_xls(filename, encoding='ascii').

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.