It's about time to move to Git Hub. The SourceForge workflow has (finally) started to feel too complex. Details to follow.
See https://github.com/slott56 for further developments.
Also see https://slott-softwarearchitect.blogspot.com for some announcements.
Also, updated the GIT repository and actually tagged this release as 4.4.5. That makes it official.
The unit tests for cobol.RECFM_VB are clearly inadequate, since the bug was obvious on first use.
The revised version of COBOL_schemata
looks like this:
def COBOL_schemata( source, replacing=None, lexer_class=Lexer ):
lexer= lexer_class( replacing )
parser= RecordFactory()
dde_list= list( parser.makeRecord( lexer.scan(source) ) )
schema_list= list( make_schema( dde ) for dde in dde_list )
return dde_list, schema_list
This gives us an API that looks like this:... read more
Added a COBOL_schemata function in the GIT repository only. This is not built into the distribution kit because it's not clear if it truly helps solve problems with real-world copybooks.
The code looks like this:
def COBOL_schemata( source, replacing=None ):
lexer= Lexer( replacing )
parser= RecordFactory()
dde_list= list( parser.makeRecord( lexer.scan(source) ) )
schema_list= list( make_schema( dde ) for dde in dde_list )
return dde_list, schema_list... [read more](/p/stingrayreader/blog/2014/05/cobol-schema-vs-cobol-schemata/)
Make embedded schema loader tolerate blank sheets by producing
a warning and returning None
instead of raising a StopIteration
exception.
Tweak the Data validation demo to handle the None-instead-of-schema feature.
Changed cobol.COBOL_file.row_get()
to leave trailing spaces
intact. This created a problem of trashing COMP items that had values
of 0x40 exactly -- an EBCDIC space.... read more
Made some tweaks to the documentation and the cobol
package that support splitting large files.
See http://stingrayreader.sourceforge.net/cobol.html#low-level-split-processing for a way to handle splits with this.
This change is only in the GIT repository, for now. It's not a formal part of the 4.4.2 release.
Made two small performance tweaks for handling large COBOL files.
These are minor; it's difficult to make more major performance enhancements without a fundamental, wholesale change to the way attributes are handled for COBOL.
Currently, we have a lazy calculation of offset and size when there are OCCURS DEPENDING ON
clauses. The calculation involves a fast (but not instantaneous) calculation of offsets.... read more
The code in the GIT repository has been updated to version 4.4.1 to include Numbers '09 and Numbers '13 files.
Some of the unit tests for RECFM handling have been corrected, and the RECFM handling has been revised based on the unit test fixes.
Ticket 15 (wrong ODO calculations) appears to be finally resolved.
The old SVN repository (which was stale code) has been dropped.
The new repository uses the Git server, and will have the live code.
This will be separate from the distribution kits, which should become less useful in the future. Simply cloning to the GIT repository should be a simpler way to get the code.
Ticket #12 resolved. This supports what we'll call RECFM=N processing.
It will handle EBCDIC files that lack BDW/RDW words at the start of blocks and records.
It uses a very large buffer, computes the "Occurs Depending On" record size, then does an unmet of the unused bytes at the end of the buffer. It's slower than processing a file with proper RDW/BDW words from the mainframe filesystem.
Numerous Changes...
Support iWork '09 Numbers Workbook files.
Fix the :py:class:cobol.defs.Usage
hierarchy to properly handle
data which can't be converted. An ErrorCell is created in the (all too common)
case where the COBOL data is invalid.
Handled precision of comp3 correctly. Ticket #9
Added :py:class:cobol.loader.Lexer_Long_Lines
to parse copybooks with
junk in positions 72:80 of each line. Ticket #11
Update the developers' guide. Ticket #7.... read more
Fix the stingray.cobol.dump()
to properly iterate through all fields. This will iterate through all indices of indexed fields. It should produce a comprehensive record dump.
The dump()
function is now an iterable over all fields in the record; each result is a tuple with (dde, attribute, indices, bytes, Cell). An error cell will also contain the raw bytes.
def raw_dump( schema, sheet ):
for row in sheet.rows():
stingray.cobol.dump( schema, row )... [read more](/p/stingrayreader/blog/2014/05/version-432/)
This release may handle COBOL EBCDIC files with Occurs Depending On properly.
This should give an interface which is compatible with other spreadsheets even
though the record layout is rather complex.
Handle more complex VALUES clauses for more complex 88-level items.
Restructure cobol, cobol.loader to add a cobol.defs module.
Handle Occurs Depending On. Parse the syntax for ODO. Update LazyRow to tweak size and offset information for each row fetched.
Add Z/OS RECFM handling in the :py:class:cobol.EBCDIC_File
class. This will allow processing "Raw" EBCDIC files with RECFM of V and RECFM of VB -- these files =include BDW and RDW headers on blocks and records.... read more
More COBOL syntax is handled properly.
To an extent, there is some support of Occurs Depending On.
Also, some more complex 88-level items are properly parsed.
Small scripts like this can be used to confirm how well COBOL is supported.
import stingray.cobol.loader
import logging, sys
import pprint
logging.basicConfig(stream=sys.stderr, level=logging.DEBUG)
with open("CCCCMST3.TXT", "r") as cobol:
schema= stingray.cobol.loader.COBOLSchemaLoader(
cobol, replacing=[("'XXXX'","X")] ).load()
pprint.pprint( schema )
logging.shutdown()
Version 4 dates from March, 2014. It switches to Python3.3. This is a rewrite. It doesn't use 2to3 or six.
If you're forced to use Python 2, you'll have to have a two-part application. The Stingray parts use Python3.3. All the other parts use Python2.
Added a "replacing" option to the COBOL Schema Loader.
with open("xyzzy.cob", "r") as cobol:
schema= stingray.cobol.loader.COBOLSchemaLoader( cobol, replacing=("'WORD'", "BAR") )... [read more](/p/stingrayreader/blog/2014/03/version-41-changes/)
The sample applications use a simplistic counts['stage'] += 1
to accumulate balance information.
After spending some quality time reviewing Audit Balance and Control designs, it's time to update the demo programs to reflect a slightly smarter approach to this.
We don't need to go all the way down the ABC road -- yet -- but we do need to acknowledge that the balances need to reflect something we can call the "ABC Invariant".... read more
Without naming names or pointing the finger of blame, let's talk about datasets that are created without much care. To be a little more specific, this is econometric data. Lots of data that's pretty well defined and mostly regular.
Except.
Attributes like ZIP code (or Postal code) often become part of the dataset for a number of reasons. The ZIP code is commonly used to disambiguate similarly-named business enterprises. ... read more
After spending the last three years working with complex econometric data sets, as well as COBOL files, I finally realized what The Big Picture (TBP™) is.
Data files (workbooks, COBOL files, whatever) are useless without a schema.
Further, there's rarely an obvious association between application program and schema. Indeed, one of the few desirable features of COBOL is that the schema is explicitly bound into the program. It's in the Environment Division as the "FD" definition for the file. (Sometimes this is just filler and the real definition is in the Data Division, but the principle is still being followed.)... read more