Home
Name Modified Size InfoDownloads / Week
COMRADMPI 2015-02-19
Future Advancements.docx 2015-05-04 18.2 kB
Readme.txt 2015-05-04 4.5 kB
flowchart.docx 2015-05-04 62.4 kB
Totals: 4 Items   85.2 kB 0
****************************************************************
A parallel computing algorithm devleloped at the Department of Computational Biology & Bioinformatics

COMRAD-MPI: Compression of Large Genomic Datasets using Parallel Computing Techniques based on COMRAD, the compression of Redundancy of DNA Dataset(Shanika et al, version 2.0.2,  2011)
Developed at the Department of Computational Biology & Bioinformatics, University of Kerala, Thirvananthapuram, Kerala.

Team: Biji C.L., Manu K. Madhu,  Vineetha V.,  Satheesh Kumar K., Vijayakumar, Achuthsankar S. Nair
Department of Computational Biology and Bioinformatics, University of Kerala, Thiruvananthapuram
School of Computer Science, Mahathma Gandhi University, Kottayam
Infosys Technologies, Trivandrum Department of Future Studies, University of Kerala, Thiruvananthapuram

****************************************************************

This README contains the help with INSTALLATION

COMRAD is a  tool for compressing the large genome dataset. The
compression is achieved through sequential mutliple passes for the creation of dictionary followed by substitiution, clean up and huffman encoding stage . 
COMRAD-MPI is a MPI implementation of COMRAD(Shanika et al, 2011). Based on version 2.0.2 of the original COMRAD,  the substitiution, clean up and huffman encoding stages are parallelized with MPI, a popular message passing programming standard. 

COMRAD-MPI is freely available to the user community.

The software is available at
 https://sourceforge.net/projects/comradmpi/


Please send bug reports, comments etc. to "bijijomy@gmail.com".
----------------------------------------------------------------

Requirements for running COMRAD:

1. Must have a rock cluster with MPICH installed.

2. Python to run tottime.py.

3. Test files may be downloaded from ftp://ftp.ncbi.nlm.nih.gov/genomes/

-------------------------------------------------------------------------------

INSTALLATION    (for Unix/Linux)
------------


1. Unpack the package in any working directory


2. Compiling the source files:

	cd lib/src
   	make

 	cd varyL/src
   	make
   	cd ../../

The executables are written to comrad-mpi-0.1/comrad/varyL/bin,

-------------------------------------------------------------------------------

To compress using COMRAD-MPI:
1. Split the input genomes  into chunks of equal size using split -n 6 filename.fa
2. Copy the names of all the files that need to be compressed into a
file.

2. Run the command
    ./comrad.sh <file of file names>

Usage:
comrad.sh [OPTIONS] FILE
comrad.sh [OPTIONS] FILE
     -n: No:of processors in the cluster
     -f: Frequency threshold (default 4)
     -l: Initial substring length (default 8)
     -o: Output directory (default /tmp/comrad)
     FILE: File name containing files to be compressed (include full path names for each file)

eg :Compression of multliple files using two processors
./comrad.sh -n 2 test



Output:

1. codebook.txt contains the codebook in plain text.

2. *.comrad are all the compressed sequence files in plain text.

3. enccodebook.txt contains the huffman encoded codebook.

4. intcodes.txt and nuclcodes.txt contains information needed by the
huffman decoder.

5. *.comrad.huffenc are all the huffman encoded sequence files.

6. comrad.log is the log file containing the timing information at each
stage of the execution.

7. The statistics of the compression are printed to STDOUT.

-------------------------------------------------------------------------------

To decompress using COMRAD:

1. Run the command
    ./decomrad.sh <file of file names>

Usage:
decomrad.sh [OPTIONS] FILE
    -o: Output directory (default /tmp/comrad, should be the same as what was used in comrad.sh)
    FILE: File of sequence file names (same as in comrad.sh)

Output:

1. deccodebook.txt contains the huffman decoded codebook.

2. *.huffdec are all the huffman decoded sequences (still COMRAD
compressed).

3. *.decomrad are all the original sequences by decompressing using
COMRAD.

4. decomrad.log is the log file containing the timing information at
each stage of the execution.

NOTE: The compression does not keep the FASTA IDs so in the *.huffdec
and *.decomrad files, the IDs are generated by the program and will not
be the same as the original. We'll eventually incorporate some way to
store compressed FASTA IDs. 



   
Source: Readme.txt, updated 2015-05-04