Home
Name Modified Size InfoDownloads / Week
GLUVAB_v0.6.pl 2019-11-07 27.9 kB
GLUVAB_README.txt 2019-11-07 4.0 kB
LICENSE.txt 2019-11-07 35.8 kB
GLUVAB_v0.5.pl 2019-07-18 27.4 kB
Totals: 4 Items   95.1 kB 0
GLUVAB Genomic Lineages of Uncultured Viruses of Archaea and Bacteria
    Copyright (C) 2019  Felipe Hernandes Coutinho (felipehcoutinho@gmail.com)

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation version 3 of the License.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.


Usage: perl GLUVAB_v0.6.pl

--help | Print this help message and exit.
--file_prefix | String to be added to the output files (Default: GLUVAB)
--threads | Number of threads to be used during Diamond search (Default: 1)

\tLineage identification criteria:
--min_lineage_reps | Minimum number of representatives in a tree node to establish a valid lineage (Default: 50).
--max_lineage_reps | Maximum number of representatives in a tree node to establish a valid lineage (Default: 999999).
--node_variable | Which node variable to use when defining lineages. One of  Average_Node_Distances, Node_Depth or Node_Height (Default: Node_Depth).
--max_cutoff | Maximum value of the selected node variable of a node to establish a valid lineage (Default: 100).
--min_cutoff | Minimum value of the selected node variable of a node to establish a valid lineage (Default: 0.0014).

\tInput files:
--genomes_file_1 | Fasta format file containing DNA sequences of viral genomes to be analyzed
--pegs_file_1 | Fasta format file containing protein sequences derived from viral genomes.\n\tSequences MUST be named as the Id of the original genomic sequence followed by _SEQNUM. \n\tExample: Scaffold_1_1, Scaffold_1_2, Scaffold_1_3, GenomeA_1, GenomeA_2...
--m8_file | M8 format file containing the results of the all-versus-all protein search generated by Diamond. MUST use the same nomeclature as for the pegs file
--dice_file | tsv format file containing the Dice distances among genomes
--tree_file | Newick format file of the tree built based on the Dice distances
--node_stats_file | tsv format file containing the average Dice distances within each node of the tree
--ref_info_file | tsv format file containing the taxonomic classification or lineage assignment of sequences in dataset1 to be used when performing closets relative classification

The order in which analyses are performed is:

1) Identify protein encoding genes in the genomes file with Prodigal (provide --genomes_file_1).
2) Perform all-verus-all search of protein encoding genes using Diamond (provide --pegs_file_1).
3) Calculate Dice distances between genomes based on the output of the diamond search (provide --m8_file).
4) Build Neighbor-Joining tree based on Dice distances (provide --dice_file).
5) Calculate statistics of Dice distances for each node of the tree (provide --tree_file and --dice_file).
6) Identify lineages in the tree (provide --tree_file and --node_stats_file).

#####################################################################################

Optional: To classify sequences in dataset2 according to their closest relative (CR) 
in dataset1 based on average amino acid identity (AAI) and percentage of matched protein encoding genes
(provide --genomes_file_1 and --genomes_file_2) or (--pegs_file_1 and --pegs_file_2).
Optionally provide --ref_info_file to the output table includes de taxonomic classification / lineage assignment of the CR

#####################################################################################

Providing a file other than the --genomes_file_1 will skip all of
the previous steps and start the analysis according to the provided file.

#####################################################################################

Dependencies:
BioPerl
Perl modules: Digest::MD5

Prodigal (v2.60)
DIAMOND (v0.9.14)
R (v3.2.5)
R libraries: phangorn
Source: GLUVAB_README.txt, updated 2019-11-07