| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| GLUVAB_v0.6.pl | 2019-11-07 | 27.9 kB | |
| GLUVAB_README.txt | 2019-11-07 | 4.0 kB | |
| LICENSE.txt | 2019-11-07 | 35.8 kB | |
| GLUVAB_v0.5.pl | 2019-07-18 | 27.4 kB | |
| Totals: 4 Items | 95.1 kB | 0 |
GLUVAB Genomic Lineages of Uncultured Viruses of Archaea and Bacteria
Copyright (C) 2019 Felipe Hernandes Coutinho (felipehcoutinho@gmail.com)
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation version 3 of the License.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
Usage: perl GLUVAB_v0.6.pl
--help | Print this help message and exit.
--file_prefix | String to be added to the output files (Default: GLUVAB)
--threads | Number of threads to be used during Diamond search (Default: 1)
\tLineage identification criteria:
--min_lineage_reps | Minimum number of representatives in a tree node to establish a valid lineage (Default: 50).
--max_lineage_reps | Maximum number of representatives in a tree node to establish a valid lineage (Default: 999999).
--node_variable | Which node variable to use when defining lineages. One of Average_Node_Distances, Node_Depth or Node_Height (Default: Node_Depth).
--max_cutoff | Maximum value of the selected node variable of a node to establish a valid lineage (Default: 100).
--min_cutoff | Minimum value of the selected node variable of a node to establish a valid lineage (Default: 0.0014).
\tInput files:
--genomes_file_1 | Fasta format file containing DNA sequences of viral genomes to be analyzed
--pegs_file_1 | Fasta format file containing protein sequences derived from viral genomes.\n\tSequences MUST be named as the Id of the original genomic sequence followed by _SEQNUM. \n\tExample: Scaffold_1_1, Scaffold_1_2, Scaffold_1_3, GenomeA_1, GenomeA_2...
--m8_file | M8 format file containing the results of the all-versus-all protein search generated by Diamond. MUST use the same nomeclature as for the pegs file
--dice_file | tsv format file containing the Dice distances among genomes
--tree_file | Newick format file of the tree built based on the Dice distances
--node_stats_file | tsv format file containing the average Dice distances within each node of the tree
--ref_info_file | tsv format file containing the taxonomic classification or lineage assignment of sequences in dataset1 to be used when performing closets relative classification
The order in which analyses are performed is:
1) Identify protein encoding genes in the genomes file with Prodigal (provide --genomes_file_1).
2) Perform all-verus-all search of protein encoding genes using Diamond (provide --pegs_file_1).
3) Calculate Dice distances between genomes based on the output of the diamond search (provide --m8_file).
4) Build Neighbor-Joining tree based on Dice distances (provide --dice_file).
5) Calculate statistics of Dice distances for each node of the tree (provide --tree_file and --dice_file).
6) Identify lineages in the tree (provide --tree_file and --node_stats_file).
#####################################################################################
Optional: To classify sequences in dataset2 according to their closest relative (CR)
in dataset1 based on average amino acid identity (AAI) and percentage of matched protein encoding genes
(provide --genomes_file_1 and --genomes_file_2) or (--pegs_file_1 and --pegs_file_2).
Optionally provide --ref_info_file to the output table includes de taxonomic classification / lineage assignment of the CR
#####################################################################################
Providing a file other than the --genomes_file_1 will skip all of
the previous steps and start the analysis according to the provided file.
#####################################################################################
Dependencies:
BioPerl
Perl modules: Digest::MD5
Prodigal (v2.60)
DIAMOND (v0.9.14)
R (v3.2.5)
R libraries: phangorn