Menu

#22 bdw_iter() consuming 2 records at a time

Release 4.4
closed
None
2015-03-03
2015-02-20
RHarris
No

with the 4.4.5 bugfix, using code based on sample split code, batch size of 500 was selected.
subsequent parsing of the split file revealed 1000 records.

Discussion

  • Steven F. Lott

    Steven F. Lott - 2015-03-01

    That indicates a unit testing error.

    Here's data used to test a VB file with a BDW and RDW. This is the kind of file that would require a RECFM=VB and would use the bdw_iter() function.

    For the iterator to be behaving badly, this file must be improper test data.

    A hex dump of the first few bytes of the offending file might provide some insight as to what's wrong with this test case.

    class TestEBCDICFile_VariableBlocked( TestEBCDICFile_Fixed ):
        def test_should_get_cells( self ):
            """Data has 4 byte BDW and 4 byte RDW in front of the row."""
            # Build 2 blocks.
            self.data= io.BytesIO( 
            b"\x00\x1d\x00\x00" #BDW
                b"\x00\x19\x00\x00" # RDW
                    b"\xe9\xd6\xe2" # WORD="ZOS"
                    b"\xf1\xf2\xf3K\xf4\xf5" # NUMBER-1="123.45"
                    b"\xf6\xf7\xf8\xf9\xf0" # NUMBER-2="678.90"
                    b"\x00\x00\x12\x34" # NUMBER-3=4660
                    b"\x98\x76\x5d" # NUMBER-4=-987.65                
            b"\x00\x1d\x00\x00" #BDW
                b"\x00\x19\x00\x00" # RDW
                    b"\xe9\xd6\xe2" # WORD="ZOS"
                    b"\xf1\xf2\xf3K\xf4\xf5" # NUMBER-1="123.45"
                    b"\xf6\xf7\xf8\xf9\xf0" # NUMBER-2="678.90"
                    b"\x00\x00\x12\x34" # NUMBER-3=4660
                    b"\x98\x76\x5d" # NUMBER-4=-987.65
            )
    
     

    Last edit: Steven F. Lott 2015-03-01
  • RHarris

    RHarris - 2015-03-03

    Nope, it is a user confusion error.

    Looking back on the documentation, it should be emphasized:

    The batch size is the number of blocks, not the number of records.

    My desire was RECFM_VB output to match the input (thus the use of bdw_iter), but I was thinking of batches in records, not blocks. Examining the hex data to answer your question reveals 2 records per BDW

    7fac 0000 3fd4 0000
    ...
    3fd4 0000
    ...
    7fac 0000 3fd4 0000
    ...
    3fd4 0000
    ...
    

    My only recommendation at this point would be for all test data to include multiple RDW per BDW since that is real...make sure all test cases still function under those conditions, which I expect they will.

     
  • Steven F. Lott

    Steven F. Lott - 2015-03-03

    Test case revised to show multiple records per block.

     
  • Steven F. Lott

    Steven F. Lott - 2015-03-03
    • status: open --> closed
    • assigned_to: Steven F. Lott
     

Log in to post a comment.

MongoDB Logo MongoDB
Gen AI apps are built with MongoDB Atlas
Atlas offers built-in vector search and global availability across 125+ regions. Start building AI apps faster, all in one place.