FileFormats modified by Ian Reid

Ian Reid — Wed, 10 Apr 2013 17:20:51 -0000

--- v1
+++ v2
@@ -1,17 +1,16 @@
 File formats
 ============

-The input RNA-Seq reads should be in Fastq format [http://maq.sourceforge.net/fastq.shtml]. tuqueSplice and tuqueMap
-determine the read length and quality value encoding automatically; these values should be constant within each reads file, but can differ between input files.
+The input RNA-Seq reads should be in __Fastq__ format \[l\]. tuqueSplice and tuqueMap determine the read length and quality value encoding automatically; these values should be constant within each reads file, but can differ between input files.

-The genome sequence file should contain the sequence of each chromosome (or scaffold or contig) in Fasta format
-[http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml].
+The genome sequence file should contain the sequence of each chromosome (or scaffold or contig) in __Fasta__ format
+\[\].

-Sequence feature annotations should be in GFF3 format [http://www.sequenceontology.org/gff3.shtml].
+Sequence feature annotations should be in __GFF3__ format \[\].

-Read mappings are output in BAM format [http://samtools.sourceforge.net/SAM1.pdf].
+Read mappings are output in __BAM__ format \[\].

-The .juncs format is as used in early versions of Tophat.
+The __.juncs__ format is as used in early versions of Tophat.
 It is a tab-delimited text format, with one line for each splice junction. Each line contains at least 4 fields
 separated by tab characters:
  1. Chromosome Id
@@ -28,4 +27,4 @@
 10. Right anchor - the maximum distance between the junction and the 3' end of a spanning read
 11. Class        - either regular, variant, or wrongway.

-The coverage.wig files are in bedGraph format [http://genome.ucsc.edu/goldenPath/help/bedgraph.html]
+The coverage.wig files are in __bedGraph__ format \[\]

FileFormats modified by Ian Reid

Ian Reid — Wed, 10 Apr 2013 17:05:25 -0000

File formats

The input RNA-Seq reads should be in Fastq format [http://maq.sourceforge.net/fastq.shtml]. tuqueSplice and tuqueMap
determine the read length and quality value encoding automatically; these values should be constant within each reads file, but can differ between input files.

The genome sequence file should contain the sequence of each chromosome (or scaffold or contig) in Fasta format
[http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml].

Sequence feature annotations should be in GFF3 format [http://www.sequenceontology.org/gff3.shtml].

Read mappings are output in BAM format [http://samtools.sourceforge.net/SAM1.pdf].

The .juncs format is as used in early versions of Tophat.
It is a tab-delimited text format, with one line for each splice junction. Each line contains at least 4 fields
separated by tab characters:
1. Chromosome Id
2. Start - the 0-based genomic coordinate of the first base that is spliced out
3. End - the 0-based genomic coordinate of the last base that is spliced out
4. Strand - either + or -

.juncs files produced by tuqueSplice contain additional fields:
5. Read-through ratio - the ratio of the mean read coverage depth on the spliced-out bases to the coverage depth on the bases immediately before and immediately after the junction
6. Multiplicity - the number of mapped reads that are spliced at this junction
7. Diversity - the number of distinct reads that are spliced at this junction
8. Donor-acceptor pair of the spliced-out intron
9. Left anchor - the maximum distance between the 5' end of a spanning read and the junction
10. Right anchor - the maximum distance between the junction and the 3' end of a spanning read
11. Class - either regular, variant, or wrongway.

The coverage.wig files are in bedGraph format [http://genome.ucsc.edu/goldenPath/help/bedgraph.html]

Recent changes to FileFormats

FileFormats modified by Ian Reid

FileFormats modified by Ian Reid

File formats