Menu

#15 Dbf constructor alters good headerLength value when opening existing table.

-
open
nobody
None
2
2014-09-19
2014-09-18
GSM
No

Some DBF files that I stumbled upon had some padding (couple of bytes) after Header (after Termination Character), therefore headerLength read from Header will differ from one given by _calcHeaderLength.
Dbfpy while reading DBF's header info uses internal method _header.addField in order to avoid chaning header.headerLength, however Dbf constructor calls header.setMemoFile which subsequently calls header._calcHeaderLength altering proper header field, rendering table unreadable in cases such as described in first paragraph.
In my opinion self._calcHeaderLength() call should be moved inside preceding IF-statement's block or better, executed only if self.signature actually changed.

Discussion

  • Aleksandr Smyshliaev

    Thank you very much for exploring your issue!

    Please, could you explain, is it that dbfpy does not conform to specifications referred in the module docs (namely, http://www.clicketyclick.dk/databases/xbase/format/), or else you have a file that does not conform to these specifications?

    Can you provide an example of such a file?

    Can you confirm that the file is produced by one of "canonical" xBase environments, such as FoxPro or XBase?

    Can you point to some official specifications describing the structure of your file example?

     
  • GSM

    GSM - 2014-09-18

    These files have been generated using Visual FoxPro (with signature 0xF5) (those use MEMO fields, and I'll be filing next bug report regarding handling of memo fields for 0xF5 version, soon) and Microsoft ODBC dBase driver with signature 0x03. These files were provided as empty table templates for the application by its developers (it's 15 years old invoicing software). As far as I went through official specification, those do comply with it and database driver should use headerLength field for determining start of raw data because there might be such padding for maybe byte aligning purposes.
    This also rises a question, whether one should read records from file according to recordLenght header field and not to calculated length, as it might contain some padding either in some occasions?
    Attached is file that was originally generated by VFP, truncated at 5th data row to reduce size, but you can clearly see one extra byte after termination character just before first record.

     

    Last edit: GSM 2014-09-18
  • Aleksandr Smyshliaev

    You didn't provide the FPT file required for opening the DBF in Visual FoxPro.

    However, I do not believe that your example_truncated.dbf was generated by FoxPro. (For reference, the attachment contains a DBF with the same structure that indeed was generated by Visual FoxPro 9.0.)

    And no, your file does not conform to Microsoft specs: http://msdn.microsoft.com/en-US/library/st4a0s68%28v=vs.80%29.aspx

    I confirm that your file has weird value in the 8th and the 9th octets of the header, and I intend to think on the matter when I have some more time, but I do believe that your file originates from some non-standard tool, and I suggest that you open this file with FoxPro and make COPY TO ... TYPE FOX2X.

    If you have not access to Visual FoxPro programming environment, you could try to change the 8th byte of the file from 2 to 1 and remove the zero byte at offset 0x701 (1793). Please contact me personally if you need help with that.

     
  • GSM

    GSM - 2014-09-19

    Right, i forgot to attach FPT file. About origins of that file, I was told it was generated by VFP by the original developers of this app, maybe it was modified by their proprietary software later (they had their DB fixing scripts).
    I cannot modify original table file to fix it as it is used 24/7 in a production environment.
    Asides from that, the bug that I reported is about dbfpy modifying completely good offset. If that's the behaviour you're after maybe it would be good to add sanity check in dbfpy constructor to make it stop processing if real offset doesn't equal one from header field. In current state if file is unusual (like mine) user is presented with cryptic excepion that doesn't tell where the real problem is.
    In my application I just modified dbfpy code to use real offset. I think it should be default behaviour, to make dbfpy more robust and less prone to mistakes made by other people as DBF is pretty a dead format, used mainly by older programs and we cannot enforce coders to fix their apps 15 years later. By doing so we won't forfeit full compliance with official standard.

     

Log in to post a comment.