Tuque Wiki

Tools for mapping RNA-Seq reads to eukaryotic genomes

Brought to you by: ian-d-reid

FileFormats

File formats

The input RNA-Seq reads should be in Fastq format [http://maq.sourceforge.net/fastq.shtml]. tuqueSplice and tuqueMap determine the read length and quality value encoding automatically; these values should be constant within each reads file, but can differ between input files.

The genome sequence file should contain the sequence of each chromosome (or scaffold or contig) in Fasta format
[http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml].

Sequence feature annotations should be in GFF3 format [http://www.sequenceontology.org/gff3.shtml].

Read mappings are output in BAM format [http://samtools.sourceforge.net/SAM1.pdf].

The .juncs format is as used in early versions of Tophat.
It is a tab-delimited text format, with one line for each splice junction. Each line contains at least 4 fields
separated by tab characters:

Chromosome Id
Start - the 0-based genomic coordinate of the first base that is spliced out
End - the 0-based genomic coordinate of the last base that is spliced out
Strand - either + or -

.juncs files produced by tuqueSplice contain additional fields:

Read-through ratio - the ratio of the mean read coverage depth on the spliced-out bases to the coverage depth on the bases immediately before and immediately after the junction
Multiplicity - the number of mapped reads that are spliced at this junction
Diversity - the number of distinct reads that are spliced at this junction
Donor-acceptor pair of the spliced-out intron
Left anchor - the maximum distance between the 5' end of a spanning read and the junction
Right anchor - the maximum distance between the junction and the 3' end of a spanning read
Class - either regular, variant, or wrongway.

The coverage.wig files are in bedGraph format [http://genome.ucsc.edu/goldenPath/help/bedgraph.html]

Tuque Wiki

Tools for mapping RNA-Seq reads to eukaryotic genomes

FileFormats

File formats

Related