HTQC Code
Quality control and filtration for illumina sequencing data
Status: Beta
Brought to you by:
jiandingzhe
===========================================================
HTQC - a high-throughput sequencing quality control toolkit
===========================================================
------------
Introduction
------------
This is a read quality control toolkit for high-throughput sequencing. It
contain a program for quality statistics, and several programs for quality
filtration.
Currently, only Illumina sequencing platform is supported.
-------------------
System requirements
-------------------
- Boost and Zlib is required for build and run the programs.
- Perl and Gnuplot are required to run "ht-stat-draw.pl", which renders the
output tables of "ht-stat" to charts.
If you build HTQC from source:
- CMake is used for cross-platform build configuration.
If your system don't have those softwares installed, please refer to your OS's
package management system (yum for Fedora, apt-get for Debian, ), or visit their official website:
http://www.freedesktop.org/wiki/Software/pkg-config
http://www.cmake.org
-------
Install
-------
See "INSTALL" document.
----------------
List of Programs
----------------
- ht-demul : separate reads into individual files by barcode sequence.
- ht-filter : filter reads by quality / length / tile ID.
- ht-asm : concatenate paired-end reads into single sequences.
- ht-primer-trim : remove primer sequences from reads.
- ht-rename : give sequences short name using auto-increased number and
user-specified prefix and suffix.
- ht-sample : randomly pick some sequences.
- ht-stat : generate reads quality statistics report.
- ht-stat-draw.pl : draw charts from ht-stat output.
- ht-trim : trim reads from start and/or end by quality.
For detailed descriptions, see individual README-XXX files for each program.
Run a program with "-h" or "--help" will show command-line options.
-------------
Typical usage
-------------
First of all, to know whether the sequencing reads are good:
$ ht-stat -P -i reads_R1_* reads_R2_* -o report_dir
$ ht-stat-draw.pl --dir report_dir
Suppose it shows tile 5 and 14 is bad. Remove reads from these tiles:
$ ht-filter -P -i reads_R1_* reads_R2_* --filter tile --reject-tiles 5,14 -o tile_removal
Trim bad ending:
$ ht-trim -i tile_removal_1.fastq -o trim_1.fastq
$ ht-trim -i tile_removal_2.fastq -o trim_2.fastq
Remove reads that are too short:
$ ht-filter --filter length -i trim_1.fastq trim_2.fastq -o long
Maybe you want to concatenate paired-ends to longer sequences:
$ ht-join -i trim_1.fastq trim_2.fastq -o joined.fastq -u unjoined
----------------------
Single-end or pair-end
----------------------
Some programs handle single-end and paired-end reads differently. For those
programs, outputs files are specified by a prefix, and multiple files will
generated. For "ht-filter", when one end of a paired-end is rejected but the
other end is accepted, it is stored to "PREFIX_s.fastq".
Programs like "ht-trim" don't distinguish between paired-end or single-end mode.
It only accepts one input file and one output file. You should run them twice
for paired-end reads, one time for the file of each end.
---------
Reference
---------
We would be really appreciated if you cite our article:
Yang X, Liu D, Liu F, Wu J, Zou J, Xiao X, Zhao F, Zhu B.
HTQC: a fast quality control toolkit for Illumina sequencing data.
BMC Bioinformatics. 2013 Jan 31;14:33
-------
Contact
-------
If you have any questions or find any bugs, please email me:
yangx@im.ac.cn