Large Text File converter Wiki
Java Based Heavy-duty utilitity to process large delimited text files
Status: Pre-Alpha
Brought to you by:
deepkamal
Welcome to your wiki!
This is the default page, edit it as you see fit. To add a new page simply reference it within brackets, e.g.: [SamplePage].
The wiki uses Markdown syntax.
TextZilla is a Multithreaded Java utility which can process huge size delimited text files to extract, convert, encode, decode, encrypt/decrypt text data from source and write it in desired output file or files.
It provides fully extensible and expandable framework based on which Java classes containing business logic can be created, for example it currently has MD5 conversion capability, based on same design classes for 3DES ,AES or any other algo can be created.
Another strength of this tool is in its configurability, it's design allows to generate as many output files as required from one input file, and at every row of input file validation, extraction, conversion can be applied.
This means developer need to focus only on the business requirements without having to worry about thread implementation, concurrency etc.
Use case Example: Legacy system has to be replaced with new advanced system with different DB schema, and the data from legacy system provided as 100GB size dump which is delimited text data that should be inserted in 10 different tables of new system's DB after validation,date format conversion, rearrangements, and MD5 hashing implementation etc.
This tool will be a quick ready to implement solution for such purposes, Developer will implement required conversion logic in already existing or newly created Converter classes implementing IConverter framework.
Please be aware that this is NOT ready to use application,
Rather this is presently a ready to use Java code, with optimized text processing and multi-threading implemented.
To begin with you need to IValidator and IConverter interfaces, IValidator class is called to validate every row of input file, which can either reject the complete row if its not matching the criteria you implement in class or it can be used to transform,substitute values in the fields of column.
Further IConverter class work on specific field of input line/row, and can be used to apply any substitution, conversion, transformation, encryption/decryption etc,
once relevant classes are created, it need to be configured for processing in the input config file,
a valid example is given in sample1 and sample2 as input files, and sampleDump1.config.properties and sampleDump2.config.properties as input config files,
to understand the flow, run.sh can be used to execute the application,
Project is open for Java developers,
any queries/questions can be discussed in forum or you can mail me @ deepkamal@gmail.com
Thanks For your time.
Last edit: Deep Kamal Singh 2012-11-05