Patch is a solution to exploit a small amount of PacBio corrected long reads (PBcRs) to improve the off-the-shelf draft genomes assembled from short reads.
a, How to use?
1, Docker Image 990210oliver/patch.docker
Use docker to pull docker image and run.
$ docker pull 990210oliver/patch.docker
$ docker run [options] 990210oliver/patch.docker /bin/bash
2, patch.tar.gz
To install all related tools, and use python scripts to run.
-Prerequisites-
* Linux 64-bit environment
* Python 2.6 or higher (http://www.python.org/)
* MUMmer 3.22 or higher (http://mummer.sourceforge.net/)
* Blast 2.2.25+ or higher (http://blast.ncbi.nlm.nih.gov/)
* Soap2 2.21 or higher (http://soap.genomics.org.cn/soapaligner.html)
-Installation-
# wget http://sourceforge.net/projects/sb2nhri/files/Patch/patch.tar.gz
# tar zxvf patch.tar.gz
Add patch folder to $PATH.
b, run patch.py (with docker)
-example-
1, E. coli K12 MG1655
# patch.py
Please give a config file!
# mkdir test
# cd test
# wget http://sourceforge.net/projects/sb2nhri/files/Patch/examples/ecoli.tar.gz
# tar zxvf ecoli.tar.gz
ecoli/
ecoli/corrected.long.fasta (PBcRs corrected by using ECTools with Abyss's unitigs)
ecoli/my.ctg.fasta (Assembly of ECTools + runCA)
ecoli/myconfig
# cat ecoli/myconfig
in_ref=/test/ecoli/my.ctg.fasta
in_clr=/test/ecoli/corrected.long.fasta
source=/opt
nucmer=nucmer
makeblastdb=makeblastdb
blastn=blastn
# N50.py ecoli/my.ctg.fasta
whole:4695577
N50: 4644297
Number of contigs: 12
Length of the longest contig: 4644297
# patch.py ecoli/myconfig
whole:4645330
N50: 4644297
Number of contigs: 2
Length of the longest contig: 4644297
2, yeast (S. cerevisiae W303)
# mkdir test
# cd test
# wget http://sourceforge.net/projects/sb2nhri/files/Patch/examples/yeast.tar.gz
# tar zxvf yeast.tar.gz
yeast/
yeast/corrected.long.fasta (PBcRs corrected by using ECTools with Abyss's unitigs)
yeast/my.ctg.fasta (Assembly of ECTools + runCA)
yeast/myconfig
# cat yeast/myconfig
in_ref=/test/yeast/my.ctg.fasta
in_clr=/test/yeast/corrected.long.fasta
source=/opt
nucmer=nucmer
makeblastdb=makeblastdb
blastn=blastn
# N50.py yeast/my.ctg.fasta
whole:13221295
N50: 476437
Number of contigs: 115
Length of the longest contig: 889557
# patch.py yeast/myconfig
whole:12203626
N50: 734494
Number of contigs: 35
Length of the longest contig: 1528116
c, config file
in_ref=/path/assembly.ctg.fasta
in_clr=/path/PBcR.fasta
source=/path/patch
nucmer=nucmer
makeblastdb=makeblastdb
blastn=blastn
1, The input files of pre-assembled contigs and PacBio corrected reads (PBcRs):
in_ref=/path/assembly.ctg.fasta
in_clr=/path/PBcR.fasta
2, The path of Patch:
source=/path/patch
3, The paths of tools required by Patch
nucmer=/path/nucmer
makeblastdb=/path/makeblastdb
blastn=/path/blastn
[Option]
4, If the genome size was specified in the config file, the longest 15X PBcRs are selected and saved as my_CLR.fa.Long.fa:
clrdepth=15
(default:15)
genome_size=4650000
5, Soap2 is required for read mapping:
2bwt-builder=/path/2bwt-builder
soap=/path/soap
6, A threshold of coverage(depth), to split contig at zero coverage:
depth=0
(default:0, available range: 0-5)
7, The paths of short reads and the range of insert size, no inset is required for single-end library:
read1=/path/read1.fa or fq
read2=/path/read1.fa or fq
min_i=100(default:100)
max_i=200(default:200)
Date: 2015/10/13
Author: Yu-Chieh Liao (jade@nhri.org.tw) and Hsin-Hung Lin (oliver0618@nhri.org.tw)