Menu

Preparing data

Thomas Abeel

TripleV requires your data to be formatted in a custom file format. The VizFileCreatorPackage that is included with the download package has all tools to get started and convert commons genomics file formats into the required format. At the bare minimum you need to load in a multiple alignment file that includes the reference sequence and a separate file called reference.txt

This package was tested and designed to run on Unix-like system.

The [TripleV file format description] gives an in-depth description how the custom file format works.

Step-by-step instructions

  1. Unzip VizFileCreatorPackage.zip in a unix directory.
  2. Modify the text file muscle_path.txt to specify the path of where your local version of muscle is located. (supported using muscle version 3.8 and above). Muscle needs to be downloaded separately from the authors.
  3. Run the config program (perl config.pl) that will create the functional version of the final script, called “createVizFile.pl”, and will show up in the same directory.
  4. Run this configured script, "createVizFile.pl" with your input files.
  5. Finally, if all goes well this will create a file called "ViralViewerDataFile.viz" that can then be loaded into TripleV.

Detailed information

File extensions and names

File extensions (what comes after the dot in a file name; e.g. ".txt") that must be used to designate the appropriate file types

The following three file types must be designated as shown below in the regular expression column. These are not case sensitive. For example, the valid name for the reference file can be references.txt, REFERENCES.TXT, or even ReFeReNcEs.TxT. The sampleID in the reference file must match exactly with the sampleID in the alignments, genelist annotations, variants files, etc.

File type Permitted extensions Files name expression Notes
Reference Files 'txt' references.txt
Alignment Files (DNA) 'fa', 'fas', 'fsa', 'fasta', 'fna','frn', 'mfa', 'afa', 'aln', 'dna' *DNA*
Alignment Files (AA) 'fa', 'fas', 'fsa', 'fasta', 'faa', 'frn', 'mfa', 'afa', 'aln', 'pep' *AA*
Variant Files (DNA) 'txt' *ntfreq.txt These include the output files from vPhaser and vProfiler.
Variant Files (AA) 'txt', 'xls' codonfreq.txt or codonfreq.xls
Gene List Files 'txt' *genelist.txt
Muscle Path 'txt' muscle_path.txt Due to the way that genomes containing genes with introns are spliced, it’s extremely difficult to use a user’s protein alignment since this will not match up perfectly with the variant file data. Therefore, we need to perform an alignment from the existing nucleotide data.
Epitope Files 'fasta', 'fas', 'fa' *EPITOPES.FASTA The epitope file is just a fasta file with a small number of amino acids that make up the peptide. We actively map all of the short protein fragments to the translated polypeptide. Note, a single peptide may map to multiple places if it can be perfectly aligned without any gaps to the reference in more than one locus.
Metadata Files 'txt' metadata.txt

Example files

These files are included in the VizFileCreatorPackage you downloaded.

File type Example file
Reference Files references.txt
Alignment Files (DNA) 9213_all_nuc_aligned_DNA.fas
Alignment Files (AA) none
Variant Files (DNA) 9213_165_ntfreq.txt
Variant Files (AA) 9213_final_cleaned_9213_165_codonfreq.xls
Gene List Files 9213_0_genelist.txt
Muscle Path muscle_path.txt
Epitope Files epitopes.fasta
Metadata Files 9213_metadata.txt

Related

Documentation: Home
Documentation: TripleV file format description

MongoDB Logo MongoDB