BPGA Documentation

A tool for ultra-fast pan-genome analysis of microbes.

Brought to you by: encoderman, guptabpga

Results

DESCRIPTION OF RESULTS (BPGA v1.3)

Also see FAQs Home

The results are generated as pdf images for all the analyses. Sometimes due to missing dependancies or version problems plots may not be generated. In such Instances, plots can be manually plotted using raw text files generated during analysis.

A: Basic Pan-genome Analysis

Analysis	Image file	Data file
Gene Family Distribution	Histogram.pdf	histogram.txt
New Gene Distribution	New_Genes_Plot.pdf	new_genes_count.txt
Basic Pan/core Genome Trend	Default_Core_Pan_Plot.pdf	pan_default.txt, core_default.txt
Genome wise Pan Genome Statistics	-	stats.xls

B: Advanced Pan-genome Analysis

Analysis	Image file	Data file
Pan/core Genome Profile (Scatter plot)	Core_Pan_Dot_Plot.pdf	pan_genome.txt, core_genome.txt
Pan/core Genome Profile (Box plot)	Core_Pan_Plot.pdf	pan_box.txt, core_box.txt
Pan genome Profile Trendlines	-	curve.xls
Pan Phylogeny	Pan_phylogeny.pdf	PAN_PHYLOGENY_MOD.ph, PAN_PHYLOGENY_MOD.nwk
Core Phylogeny	Core_phylogeny.pdf	CORE_PHYLOGENY_MOD.ph
Functional Distribution (Major COG catagories)	COG_DISTRIBUTION.pdf	Major_Cog_Category1.txt
Functional Distribution (COG sub-catagories)	COG_DISTRIBUTION_DETAILS.pdf	Cog_Category1.txt
Pathway Distribution (Major KEGG catagories)	KEGG_DISTRIBUTION.pdf	kegg_histogram1.txt
Pathway Distribution (KEGG sub-catagories)	KEGG_DISTRIBUTION_DETAILS.pdf	kegg_histogram1.txt
Pathway Distribution (Pathway wise Counts)	-	Kegg_count_details1.txt

C: Sequence Retrieval

Sequence	File	Details
Representatives of Core Gene Families	REPSEQ_CORE.txt	Header has Status and Gene ID, Protein FASTA
Representatives of Accessory Gene Families	REPSEQ_ACCESSORY.txt	Header has Status and Gene ID, Protein FASTA
Representatives of Unique Gene Families	REPSEQ_UNIQUE.txt	Header has Status and Gene ID, Protein FASTA
Core Gene Families (All Members from all genomes)	core_seq.txt	Header has Status, Gene ID, Gene Family ID, and Genome ID. Protein FASTA
Acc. Gene Families (All Members from 2 or more genomes)	accessory_seq.txt	Header has Status, Gene ID, Gene Family ID, and Genome ID. Protein FASTA
Unique Gene Families (All Members from individual genomes)	unique_seq.txt	Header has Status, Gene ID, Gene Family ID, and Genome ID. Protein FASTA
Gene Families with Exclusive Absence	exclusively_absent_seq.txt	Header has Status, Gene ID, Gene Family ID, and Genome ID. Protein FASTA (Gene from any one genome is missing from these gene families)
Core genes with Atypical GC	core_genes_with_atypical_GC_content.txt	Header has Status, Gene ID, Gene Family ID, and Genome ID. Protein FASTA
Acc. genes with Atypical GC	accessory_genes_with_atypical_GC_content.txt	Header has Status, Gene ID, Gene Family ID, and Genome ID. Protein FASTA
Unique genes with Atypical GC	unique_genes_with_atypical_GC_content.txt	Header has Status, Gene ID, Gene Family ID, and Genome ID. Protein FASTA

Other Supporting Files

File	Description	Comment
DATASET.xls	Details about seleceted organisms	-
list	list of selected organisms contains Genome ID adn Genome Name	This is used for reference genome IDs found anywhere else.
INPUT_all.faa/seq	Database Protein FASTA file for clustering	Contains all the protein sequences from all the genomes and has: Genome ID, Gene ID and Organism Name. (Also has GC content if generated from Genbank Option)
INPUT_all.ffn	Nucleotide FASTA	Contains all the coding sequences from all the genomes
gi_name	Reference gene names	Contains Gene ID and Standard Gene Name.
matrix.txt	1,0 matrix in binary form	Where each column represents genome (serially as per list file sequence) and rows represent gene families. 1 for presenc, 0 for absence of genes from respective genome and gene family.

BPGA **Documentation**

A tool for ultra-fast pan-genome analysis of microbes.

Results

DESCRIPTION OF RESULTS (BPGA v1.3)

Related

BPGA Documentation