DEMIC is a computational tool for comparing bacterial growth rates between metagenomic samples, based on inferred relative distances of contigs from replication origin according to their coverages.
Any questions about DEMIC please mail to gy.james@163.com
1 Commands and arguments
How to run DEMIC:
perl DEMIC.pl -S /input/sam_dir/ -F /input/fa_dir/ -O /output/dir/
The arguments of DEMIC are as follows:
-S, --sam_dir
directory of input SAM files generated by bowtie2 (required)
-F, --fasta_dir
directory of input FASTA files, each for a species (required)
-O, --out_dir
directory of output files (default: ./)
-W, --window_size
size (nt) of window for calculation of coverage (default: 5000)
-D, --window_step
step (nt) of window for calculation of coverage (default: 100)
-M, --mapq_cutoff
cutoff of mapping quality when calculating coverages (default: 5)
-L, --mapl_cutoff
cutoff of mapping length when calculating coverages (default: 50)
-R, --max_mismatch_ratio
maximum of mismatch ratio for each read as a hit (default: 0.03)
-G, --log
output log file name (optional)
-H, --help
show this help information
-T, --thread_num
set number of threads for parallel running (default: 1)
-Q, --quiet
keep quiet when running
-A, --output_all
keep the temporary files after running (more disk space would be
needed)
2 Preparation for using DEMIC
(1) Co-assembly
Co-assembly of metagenomic reads in different samples. Please use existing algorithms such as MegaHit and Ray-Meta.
For example, you can use MegaHit to generate contigs from metagenomic reads of three samples in a directory named MEGAHIT_assembly/:
cat sample1.fastq sample2.fastq sample3.fastq > samples.fastq
megahit -m 100000000000 -r samples.fastq -o MEGAHIT_assembly
(2) Alignment
Mapping metagenomic reads to contigs. Please use BowTie2. The SAM files need sorting by samtools.
For example, you can use the following commands to alignment reads in sample1 and output to directory SAM/:
bowtie2-build MEGAHIT_assembly/final.contigs.fa MEGAHIT_assembly/final.contigs.fa
bowtie2 -q -x MEGAHIT_assembly/final.contigs.fa -U sample1.fastq -S SAM/sample1.sam
Sort the alignment and output to SAM_sorted/sample1_sort.sam:
samtools view -bS SAM/sample1.sam | samtools sort - SAM/sample1_sort
samtools view -h SAM/sample1_sort.bam > SAM_sorted/sample1_sort.sam
Please do this for .sam files of all samples.
(3) Binning
Clustering contigs into groups. Please use existing algorithms such as MaxBin and MetaBAT.
For example, you can use MaxBin and output contig clusters to maxbin/ as follows:
run_MaxBin.pl -contig MEGAHIT_assembly/final.contigs.fa -out maxbin -reads_list ./reads.list
(./reads.list contains path to the above three .fastq files)
All of the above steps are commonly used in assembly based analysis of metagenomic data.
The directory maxbin/ and SAM_sorted/ will be needed by DEMIC.
3 An example of running DEMIC
(1) Please make sure you have installed parallel Perl 5.10.1 or higher, R 3.3.1 or higher and use Linux or Mac OS X operation system.
(2) Install packages of "lme4" and "FactoMineR" in R by simply typing as follows:
> install.packages("lme4")
> install.packages("FactoMineR")
(3) Let's enter the directory of DEMIC in terminal, and move test data to it:
$ cd /path/to/DEMIC_v1.0.2/
$ mv /path/to/test_data/ ./
The test data include sequencing data alignments of three samples randomly selected and mixed from PTRC study (Korem et al, 2015, Science; ENA accession number PRJEB9718), which are formatted in sorted .sam in directory SAM_sorted/. Two contig clusters generated by maxbin are included in directory maxbin/, corresponding to C. rodentium and E. faecalis, respectively.
(4) Then we can run DEMIC by the following command:
$ perl DEMIC.pl -S test_data/SAM_sorted/ -F test_data/maxbin/ -O DEMIC/
(5) DEMIC will complete its work in about 1 minute, and generate the output files in directory DEMIC/.
In this directory, you can see the details of file all_PTR.txt:
$ cat DEMIC/all_PTR.txt
Sample1 Sample2 Sample3
ContigCluster1 1.5549 2.3313 1.8395
ContigCluster2 2.2430 1.7119 2.2856
The output file is easy to understand. For example, C. rodentium (ContigCluster1) has the highest growth rate in Sample2.
4 Questions and Answers:
(1) How to use multiple threads in DEMIC?
DEMIC supports multiple threads. When you have multiple CPUs or cores, please designate thread number in DEMIC using the argument -T.
We have optimized RAM utility in DEMIC, but using multiple threads will still slightly increase RAM usage.
As a suggestion, please choose to use thread number fewer than or equal to the smaller of SAM file number and contig cluster number.
(2) How to design experiment and analysis pipeline to better use DEMIC?
Although DEMIC was developed for various conditions, it may have better performances when experiment and analysis pipeline are better designed.
1) sample number:
DEMIC is based on multiple samples, and it tends to be more accurate for a species that is contained in six or more samples. We suggest you sequence more than six samples in a reasonable and comparable depths. These different samples can be from controls, replicates, different subjects or conditions.
2) assembler:
Assembly method is important for performances of both binning method and DEMIC. An assembler that can produce relatively long contigs for different depths and species with relatively few mistakes will be preferred. We used MEGAHIT in our tests, which showed a good performance.
3) binner:
Although DEMIC can purify a contig cluster by removing potential contaminations, it is still affected by binning qualities such as completeness. For microbial communities with low or medium complexity, we suggest MaxBin.