Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
Readme.txt | 2015-02-19 | 4.1 kB | |
comradmpi_0.1.rar | 2015-02-19 | 40.7 MB | |
Totals: 2 Items | 40.7 MB | 0 |
**************************************************************** A parallel computing algorithm devleloped at the Department of Computational Biology & Bioinformatics COMRAD-MPI: Compression of Large Genomic Datasets using Parallel Computing Techniques based on COMRAD, the compression of Redundancy of DNA Dataset(Shanika et al, version 2.0.2, 2011) Developed at the Department of Computational Biology & Bioinformatics, University of Kerala, Thirvananthapuram, Kerala. **************************************************************** This README contains the help with INSTALLATION COMRAD is a tool for compressing the large genome dataset. The compression is achieved through sequential mutliple passes for the creation of dictionary followed by substitiution, clean up and huffman encoding stage . COMRAD-MPI is a MPI implementation of COMRAD(Shanika et al, 2011). Based on version 2.0.2 of the original COMRAD, the substitiution, clean up and huffman encoding stages are parallelized with MPI, a popular message passing programming standard. COMRAD-MPI is freely available to the user community. The software is available at https://sourceforge.net/projects/comradmpi/ Please send bug reports, comments etc. to "bijijomy@gmail.com". ---------------------------------------------------------------- Requirements for running COMRAD: 1. Must have a rock cluster with MPICH installed. 2. Python to run tottime.py. 3. Test files may be downloaded from ftp://ftp.ncbi.nlm.nih.gov/genomes/ ------------------------------------------------------------------------------- INSTALLATION (for Unix/Linux) ------------ 1. Unpack the package in any working directory 2. Compiling the source files: cd lib/src make cd varyL/src make cd ../../ The executables are written to comrad-mpi-0.1/comrad/varyL/bin, ------------------------------------------------------------------------------- To compress using COMRAD-MPI: 1. Split the input genomes into chunks of equal size using split -n 6 filename.fa 2. Copy the names of all the files that need to be compressed into a file. 2. Run the command ./comrad.sh <file of file names> Usage: comrad.sh [OPTIONS] FILE comrad.sh [OPTIONS] FILE -n: No:of processors in the cluster -f: Frequency threshold (default 4) -l: Initial substring length (default 8) -o: Output directory (default /tmp/comrad) FILE: File name containing files to be compressed (include full path names for each file) eg :Compression of multliple files using two processors ./comrad.sh -n 2 test Output: 1. codebook.txt contains the codebook in plain text. 2. *.comrad are all the compressed sequence files in plain text. 3. enccodebook.txt contains the huffman encoded codebook. 4. intcodes.txt and nuclcodes.txt contains information needed by the huffman decoder. 5. *.comrad.huffenc are all the huffman encoded sequence files. 6. comrad.log is the log file containing the timing information at each stage of the execution. 7. The statistics of the compression are printed to STDOUT. ------------------------------------------------------------------------------- To decompress using COMRAD: 1. Run the command ./decomrad.sh <file of file names> Usage: decomrad.sh [OPTIONS] FILE -o: Output directory (default /tmp/comrad, should be the same as what was used in comrad.sh) FILE: File of sequence file names (same as in comrad.sh) Output: 1. deccodebook.txt contains the huffman decoded codebook. 2. *.huffdec are all the huffman decoded sequences (still COMRAD compressed). 3. *.decomrad are all the original sequences by decompressing using COMRAD. 4. decomrad.log is the log file containing the timing information at each stage of the execution. NOTE: The compression does not keep the FASTA IDs so in the *.huffdec and *.decomrad files, the IDs are generated by the program and will not be the same as the original. We'll eventually incorporate some way to store compressed FASTA IDs.