PhyloTrack Wiki

PhyloTrack, D3.js and JBrowse for phylogeny and positioning of samples

Status: Alpha

Brought to you by: ediezben

PhyloTrack Help

In this section you will find the support information related to the PhyloTrack library which is used to create the files needed to set up the application in your server. To see the help information for the application itself and how to navigate through it go to the PhyTB Help (insert link). An important remark is that each Perl script used in this library has it's own help displayed by typing "perl nameOfTheScript.pl" in the command line.

Creating Your Tree

In order to create the tree you will need to run three scripts. The first one, the "NEWICKtoJSON.py" will need the input of a text file with the tree in NEWICK format ("input_treeNEWICK.txt" in the Test files folder), and it will create a JSON copy of this tree. Note that you will have to be installed python, and the "tkinter" module. The script will guide you displaying a file chooser to select the Newick tree and another one for creating the JSON file where the new tree will be stored.

The second script needed to complement this JSON file is the "AddingAttributesToJSON.pl", in this case is a Perl script, so you will need to have Perl installed. This script is used to read the tab-delimited file containing the samples annotation information and to include its information for each sample node in the tree (JSON format). The input files are the tree in JSON format ("input_treeJSON.json" in the Test files folder) and a tab-delimited file with the meta-data for each sample ("attributes_annotation.txt" in the Test files folder).Note that this files should contain a first row with the name of the attributes and that the column containing the names for each sample used in the tree has to be in the 3rd column. The script will ask you which attribute would you like to add and you should run it as many times as attributes you want to add.

The third and last script needed to finish the tree is the "colors_into_JSON.pl", this script needs Perl and more precisely the Bio::Tree and Data::Dumper packages. # This script converts a tab delimited file with information related to the colour assigned for a lineage in RGB into a tab delimited file with the corresponding colour in Hex notation for HTML. Then it puts together information related to the colouring of the lineages in the tree (JSON format). It crates a FASTA-similar file where each lineage is assigned a Hex colour and starting from the node defining the lineage, uses the Newick format file to retrieve a list of all the nodes that are downwards it (Using Bio::Perl::Tree). The script uses a tab-delimited text file with the lineages and starting nodes ("lineage_defining_nodes.txt" in the Test files folder), a tab-delimited text file with the lineages and the RGB colours assigned ("RGB_lineage_colors.txt"), a file with the JSON tree to input the colouring data ("input_treeJSON.json") and a text file with the tree in Newick format ("input_treeNEWICK.txt"). Finally it inserts the colours into the JSON tree in a sequence ensuring that the sub-lineages colours override the superior lineages colours, this colours will be store under the tag "linColor" in the tree.

Creating The VCF files for the SNP functionality and Genome Browser

The next step once we have our tree created is to create the VCF files needed for the Genome Browser and the SNP functionality of the application.To set up the Genome Browser the Configuration Guide from JBrowse should be followed in order to set up the sequence track and the features (genes) track. Furthermore, in order to create the Track for the SNPs a VCF file is needed. If you have the SNPs stored as a tab-delimited file ("SNP_annotation.txt" in the Test files folder), we have created a script that will store this information into a VCF file, including all the non-standard fields into the INFO section in the file under new tags created manually. This script is the "FilesToVCF.pl". It can also be used to create the file that stores the comma-separated list of nodes related to each SNP commenting the line 101 and uncommenting line 103. This script uses as inputs the previously mentioned SNP annotation file and a sparse matrix where we can find the nodes in the columns and the SNPs in the rows (found uncompressing the file "Matrix_SNP_Vs_Nodes.zip"), and the matrix filled with "1" in case the node is present in a node and a "0" in case is not present. Is worth mentioning that this script will create only the body of the VCF file. And the header should be created manually following this VCF file example

Another script has been created to create the VCF-similar file that will store the list of SNPs related to each node in the tree ("FileToVCFNodes.pl").The input needed in the sparse matrix ( uncompressing "Matrix_SNP_Vs_Nodes.zip") In this case the each node in the tree has been assigned a numeric index in increasing order, thus in the application an array object should be created that stores all the node numbers in the same order and the index can be extracted using the name of the node.

In case you only have a matrix with the SNPs related to the samples but not the SNPs related with the nodes (this was our case), a script has been created that extends this matrix "ExtendingSparseMatrix.pl". The script needs the packages Bio::TreeIO and Data::Dumper installed. This script uses 3 input files. The sparse matrix of samples (uncompressing "sparse_Matrix_samples.zip") a tab-delimited file with the SNP changes that appear between each pair of nodes ("nodeChanges.txt") and the tree in Newick format ("input_treeNEWICK.txt")

The next step is to create the VCF-similar files where the information for the Drug Resistance related SNPs, lineage specific SNPs and node specific SNPs will be stored. For this purpose we have created three very similar files ("FilesToVCFDrug.pl", "FilesToVCFLineage.pl" and "FilesToVCFNodes.pl" ), as they only transform a tab-delimited file into a VCF-similar file where the information not included in the standard format is stored into the INFO tag under newly created tags. They only need the tab-delimited files ("Drug_ann.txt","lineage_specific_SNPs.txt" & "node_specific_SNPs.txt" respectively) as input and the headers should be created afterwards manually.