Home
Name Modified Size InfoDownloads / Week
core_genome_build.pl 2014-04-25 2.3 kB
core_map2snp.pl 2014-04-25 5.9 kB
Protocol.pdf 2014-04-25 365.5 kB
Readme.txt 2014-04-25 4.5 kB
strict_filter.pl 2014-04-25 9.9 kB
remap_qualifiedASM.pl 2014-04-25 3.9 kB
core_snp2fas.pl 2014-04-25 365 Bytes
nucmer_filter.pl 2014-04-25 352 Bytes
core_remove_region.pl 2014-04-25 692 Bytes
aln_extract.pl 2014-04-25 3.9 kB
Totals: 10 Items   397.4 kB 0
Standard process for “Assembly based core genome analysis”

1/ De novo assembly (Public tools)
Target: Get a good assembly from short reads. 
Tools: Velvet, SOAPdenovo, ALLPATH, SPAdes or other programs.
Output: <assembly file>. A multi-fasta file contains all the contigs or scaffolds generated by assembler. 

2/ Gap fill (Public tools)
Target: Fill most of the intra-scaffold gaps.
Tools: GapCloser (http://soap.genomics.org.cn/about.html) or GapFiller (http://genomebiology.com/2012/13/6/R56)
Output: <new assembly file>. A multi-fasta file contains all the contigs or scaffolds, with most of the intra-scaffold gaps were filled. 

3/ Remap (Download here)
Target: Improve the assembly and determine the quality of each base in the assembly
Tool: remap_qualifiedASM.pl (Bowtie2 and Samtools implemented)
Usage: 1/ Make sure the following programs were in your PATH, or you need to change the script.
		bowtie2-build
		bowtie2
		samtools
		bcftools
	2/ perl remap_qualifiedASM.pl <assembly file> <read 1> <read 2> <remapped assembly> <no. of iterations>
		<read 1> <read 2>: Two ends of reads in separate file. Only support insertion length between 250-500 bp with default parameters. Please change the script if you want to use other insertion length. 
		<remapped assembly>: the file to be generated by remap_qualifiedASM.pl.
		<no. of iterations>: At least once. 1: no improvement based on remapping. >2: improve base-calling and short indels by iterative remapping. 

4/ MUMmer alignment to a reference (Download here)
Target: Generated the aligned sequence of assembles (or public genomes) to a reference with MUMmer.
Tools: nucmer_filter.pl (MUMmer, strict_filter.pl and aln_extract.pl implemented)
Usage: 1/ Make sure the following programs were in your PATH, or you need to change the script.
	nucmer
	delta-filter
	show-aligns
	2/ Change the line: $core_ope_path = "/usr/local/bioinf/core_ope" to the actual folder that contains “strict_filter.pl” and “aln_extract.pl”. 
	3/ perl nucmer_filter.pl <reference> <query genome> <query alignment>
		<query alignment>: A file that has the same length as the reference but shows the alignment of the <query genome>. Only deletions and mutations in the <query genome> are included. Any insertions will be ignored. 

5/ Merge different MUMmer alignment to a multiple fasta file (system command)
	USAGE: cat <reference> <query alignment1> <query alignment2> … > <multi-genome alignment>

6/ Call (relaxed) core genome (Download here)
	Target: Generate a core genome or a relaxed core genome based on the multi-genome alignment. 
	Tools: core_genome_build.pl
	Usage: perl core_genome_build.pl <multi-genome alignment> <name of reference> <no. of relaxed genome> > <core genome>
		<no. of relaxed genome>: default: 0. Number of missing sites allowed in the core genome. The larger the number, the more sites will be included in the analysis. However, it will bring in some paralogs and elements due to LGT, as well. 

7/ Remove some regions in the core genome (Download here)
	Target: Short read sequencing can not handle some regions, such as 16S rRNA, tRNAs, crisprs or Super-Integron in Vibrio cholera. Remove these regions will reduce errors due to uncertain base calls and improve the phylogeny significantly. 
	Tool: core_genome_build.pl
	Usage: perl core_remove_region.pl <core genome> <region file> > <new core genome>
		<region file>: a file contains regions to be excluded, with format: 
<Region1> <start coordinate in reference> <end coordinate in reference >
<Region2> <start coordinate in reference> <end coordinate in reference >
…

8/ Call SNPs from core genome (Download here)
	Target: Call SNPs. 
	Tools: core_map2snp.pl
	Usage: perl core_map2snp.pl <core genome alignment> <annotation of reference> > <core SNP>
		<annotation of reference>: is NOT required. The <core SNP> will include extra information if annotation is added. 
		File format (\t means ‘Tab’ button in the keyboard:
<gene name1>\t<chromosome>\t<Gene type: CDS/pseudogene/RNA/…>\t<start>\t<end>\t.\t<direction: +/->
<gene name2>\t<chromosome>\t<Gene type: CDS/pseudogene/RNA/…>\t<start>\t<end>\t.\t<direction: +/->
<gene name3>\t<chromosome>\t<Gene type: CDS/pseudogene/RNA/…>\t<start>\t<end>\t.\t<direction: +/->

9/ Generate a SNP-only fasta file  (Download here)
	Target: Prepare file for further analysis, such as MEGA, RAxML or BEAST. 
	Usage: perl core_snp2fas.pl <core SNP> > <core SNP fasta file>
Source: Readme.txt, updated 2014-04-25