TriageTools Wiki

Tools for partitioning and prioritizing fastq data

Brought to you by: tkonopka

TriageLength

Triage by length

The length tool selects reads based on the number of bases. It is useful for removing reads too short to align reliably.

Singe-end samples

To extract reads longer than, say, 40 from a single-end sample:

java -jar triagetools.jar length --length 40 -i allreads.txt.gz -o myreads.txt.gz

This will create one output file myreads-lengths-40-max.txt.gz. To obtain a full partitioning of the input, ie. one file with long reads, and another with the shorter ones, add the --all flag:

java -jar triagetools.jar length --length 40 --all -i allreads.txt.gz -o myreads.txt.gz

This will create two output files myreads-lengths-0-40.txt.gz and myreads-lengths-40-max.txt.gz.

To partition reads into multiple length bins:

java -jar triagetools.jar length --length 30,40 -i allreads.txt.gz -o myreads.txt.gz

Paired-end samples

Paired-end samples are processed using multiple -i and -o flags:

java -jar triagetools.jar length --length 40 -i allreads_1.txt.gz -i allreads_2.txt.gz 
    -o myreads_1.txt.gz -o myreads_2.txt.gz

In this case, a read pair is placed into a bin if either read in the pair is longer than the threshold.