Menu

Results

Narendrakumar Chaudhari

DESCRIPTION OF RESULTS (BPGA v1.3)

Also see FAQs Home

The results are generated as pdf images for all the analyses. Sometimes due to missing dependancies or version problems plots may not be generated. In such Instances, plots can be manually plotted using raw text files generated during analysis.

A: Basic Pan-genome Analysis

Analysis Image file Data file
Gene Family Distribution Histogram.pdf histogram.txt
New Gene Distribution New_Genes_Plot.pdf new_genes_count.txt
Basic Pan/core Genome Trend Default_Core_Pan_Plot.pdf pan_default.txt, core_default.txt
Genome wise Pan Genome Statistics - stats.xls

B: Advanced Pan-genome Analysis

Analysis Image file Data file
Pan/core Genome Profile (Scatter plot) Core_Pan_Dot_Plot.pdf pan_genome.txt, core_genome.txt
Pan/core Genome Profile (Box plot) Core_Pan_Plot.pdf pan_box.txt, core_box.txt
Pan genome Profile Trendlines - curve.xls
Pan Phylogeny Pan_phylogeny.pdf PAN_PHYLOGENY_MOD.ph, PAN_PHYLOGENY_MOD.nwk
Core Phylogeny Core_phylogeny.pdf CORE_PHYLOGENY_MOD.ph
Functional Distribution (Major COG catagories) COG_DISTRIBUTION.pdf Major_Cog_Category1.txt
Functional Distribution (COG sub-catagories) COG_DISTRIBUTION_DETAILS.pdf Cog_Category1.txt
Pathway Distribution (Major KEGG catagories) KEGG_DISTRIBUTION.pdf kegg_histogram1.txt
Pathway Distribution (KEGG sub-catagories) KEGG_DISTRIBUTION_DETAILS.pdf kegg_histogram1.txt
Pathway Distribution (Pathway wise Counts) - Kegg_count_details1.txt

C: Sequence Retrieval

Sequence File Details
Representatives of Core Gene Families REPSEQ_CORE.txt Header has Status and Gene ID, Protein FASTA
Representatives of Accessory Gene Families REPSEQ_ACCESSORY.txt Header has Status and Gene ID, Protein FASTA
Representatives of Unique Gene Families REPSEQ_UNIQUE.txt Header has Status and Gene ID, Protein FASTA
Core Gene Families (All Members from all genomes) core_seq.txt Header has Status, Gene ID, Gene Family ID, and Genome ID. Protein FASTA
Acc. Gene Families (All Members from 2 or more genomes) accessory_seq.txt Header has Status, Gene ID, Gene Family ID, and Genome ID. Protein FASTA
Unique Gene Families (All Members from individual genomes) unique_seq.txt Header has Status, Gene ID, Gene Family ID, and Genome ID. Protein FASTA
Gene Families with Exclusive Absence exclusively_absent_seq.txt Header has Status, Gene ID, Gene Family ID, and Genome ID. Protein FASTA (Gene from any one genome is missing from these gene families)
Core genes with Atypical GC core_genes_with_atypical_GC_content.txt Header has Status, Gene ID, Gene Family ID, and Genome ID. Protein FASTA
Acc. genes with Atypical GC accessory_genes_with_atypical_GC_content.txt Header has Status, Gene ID, Gene Family ID, and Genome ID. Protein FASTA
Unique genes with Atypical GC unique_genes_with_atypical_GC_content.txt Header has Status, Gene ID, Gene Family ID, and Genome ID. Protein FASTA

Other Supporting Files

File Description Comment
DATASET.xls Details about seleceted organisms -
list list of selected organisms contains Genome ID adn Genome Name This is used for reference genome IDs found anywhere else.
INPUT_all.faa/seq Database Protein FASTA file for clustering Contains all the protein sequences from all the genomes and has: Genome ID, Gene ID and Organism Name. (Also has GC content if generated from Genbank Option)
INPUT_all.ffn Nucleotide FASTA Contains all the coding sequences from all the genomes
gi_name Reference gene names Contains Gene ID and Standard Gene Name.
matrix.txt 1,0 matrix in binary form Where each column represents genome (serially as per list file sequence) and rows represent gene families. 1 for presenc, 0 for absence of genes from respective genome and gene family.

Related

**Documentation**: Home
**Documentation**: Results

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.