HTQC Code
Quality control and filtration for illumina sequencing data
Status: Beta
Brought to you by:
jiandingzhe
=========================================================== HTQC - a high-throughput sequencing quality control toolkit =========================================================== ------------ Introduction ------------ This is a read quality control toolkit for high-throughput sequencing. It contain a program for quality statistics, and several programs for quality filtration. Currently, only Illumina sequencing platform is supported. ------------------- System requirements ------------------- - Boost and Zlib is required for build and run the programs. - Perl and Gnuplot are required to run "ht-stat-draw.pl", which renders the output tables of "ht-stat" to charts. If you build HTQC from source: - CMake is used for cross-platform build configuration. If your system don't have those softwares installed, please refer to your OS's package management system (yum for Fedora, apt-get for Debian, ), or visit their official website: http://www.freedesktop.org/wiki/Software/pkg-config http://www.cmake.org ------- Install ------- See "INSTALL" document. ---------------- List of Programs ---------------- - ht-demul : separate reads into individual files by barcode sequence. - ht-filter : filter reads by quality / length / tile ID. - ht-asm : concatenate paired-end reads into single sequences. - ht-primer-trim : remove primer sequences from reads. - ht-rename : give sequences short name using auto-increased number and user-specified prefix and suffix. - ht-sample : randomly pick some sequences. - ht-stat : generate reads quality statistics report. - ht-stat-draw.pl : draw charts from ht-stat output. - ht-trim : trim reads from start and/or end by quality. For detailed descriptions, see individual README-XXX files for each program. Run a program with "-h" or "--help" will show command-line options. ------------- Typical usage ------------- First of all, to know whether the sequencing reads are good: $ ht-stat -P -i reads_R1_* reads_R2_* -o report_dir $ ht-stat-draw.pl --dir report_dir Suppose it shows tile 5 and 14 is bad. Remove reads from these tiles: $ ht-filter -P -i reads_R1_* reads_R2_* --filter tile --reject-tiles 5,14 -o tile_removal Trim bad ending: $ ht-trim -i tile_removal_1.fastq -o trim_1.fastq $ ht-trim -i tile_removal_2.fastq -o trim_2.fastq Remove reads that are too short: $ ht-filter --filter length -i trim_1.fastq trim_2.fastq -o long Maybe you want to concatenate paired-ends to longer sequences: $ ht-join -i trim_1.fastq trim_2.fastq -o joined.fastq -u unjoined ---------------------- Single-end or pair-end ---------------------- Some programs handle single-end and paired-end reads differently. For those programs, outputs files are specified by a prefix, and multiple files will generated. For "ht-filter", when one end of a paired-end is rejected but the other end is accepted, it is stored to "PREFIX_s.fastq". Programs like "ht-trim" don't distinguish between paired-end or single-end mode. It only accepts one input file and one output file. You should run them twice for paired-end reads, one time for the file of each end. --------- Reference --------- We would be really appreciated if you cite our article: Yang X, Liu D, Liu F, Wu J, Zou J, Xiao X, Zhao F, Zhu B. HTQC: a fast quality control toolkit for Illumina sequencing data. BMC Bioinformatics. 2013 Jan 31;14:33 ------- Contact ------- If you have any questions or find any bugs, please email me: yangx@im.ac.cn