From: Robert D. <rm...@sa...> - 2018-07-18 16:08:19
|
Samtools (and HTSlib and BCFtools) version 1.9 is now available from GitHub and SourceForge https://sourceforge.net/projects/samtools/ https://github.com/samtools/htslib/releases/tag/1.9 https://github.com/samtools/samtools/releases/tag/1.9 https://github.com/samtools/bcftools/releases/tag/1.9 The main changes are listed below: ------------------------------------------------------------------------------ htslib - changes v1.9 ------------------------------------------------------------------------------ * If `./configure` fails, `make` will stop working until either configure is re-run successfully, or `make distclean` is used. This makes configuration failures more obvious. (#711, thanks to John Marshall) * The default SAM version has been changed to 1.6. This is in line with the latest version specification and indicates that HTSlib supports the CG tag used to store long CIGAR data in BAM format. * bgzip integrity check option '--test' (#682, thanks to @sd4B75bJ, @jrayner) * Faidx can now index fastq files as well as fasta. The fastq index adds an extra column to the `.fai` index which gives the offset to the quality values. New interfaces have been added to `htslib/faidx.h` to read the fastq index and retrieve the quality values. It is possible to open a fastq index as if fasta (only sequences will be returned), but not the other way round. (#701) * New API interfaces to add or update integer, float and array aux tags. (#694) * Add `level=<number>` option to `hts_set_opt()` to allow the compression level to be set. Setting `level=0` enables uncompressed output. (#715) * Improved bgzip error reporting. * Better error reporting when CRAM reference files can't be opened. (#706) * Fixes to make tests work properly on Windows/MinGW - mainly to handle line ending differences. (#716) * Efficiency improvements: - Small speed-up for CRAM indexing. - Reduce the number of unnecessary wake-ups in the thread pool. (#703) - Avoid some memory copies when writing data, notably for uncompressed BGZF output. (#703) * Bug fixes: - Fix multi-region iterator bugs on CRAM files. (#684) - Fixed multi-region iterator bug that caused some reads to be skipped incorrectly when reading BAM files. (#687) - Fixed synced_bcf_reader() bug when reading contigs multiple times. (#691, reported by @freeseek) - Fixed bug where bcf_hdr_set_samples() did not update the sample dictionary when removing samples. (#692, reported by @freeseek) - Fixed bug where the VCF record ref length was calculated incorrectly if an INFO END tag was present. (71b00a) - Fixed warnings found when compiling with gcc 8.1.0. (#700) - sam_hdr_read() and sam_hdr_write() will now return an error code if passed a NULL file pointer, instead of crashing. - Fixed possible negative array look-up in sam_parse1() that somehow escaped previous fuzz testing. (#731, reported by @fCorleone) - Fixed bug where cram range queries could incorrectly report an error when using multiple threads. (#734, reported by Brent Pedersen) - Fixed very rare rANS normalisation bug that could cause an assertion failure when writing CRAM files. (#739, reported by @carsonhh) ------------------------------------------------------------------------------ samtools - changes v1.9 ------------------------------------------------------------------------------ * Samtools mpileup VCF and BCF output is now deprecated. It is still functional, but will warn. Please use bcftools mpileup instead. (#884) * Samtools mpileup now handles the '-d' max_depth option differently. There is no longer an enforced minimum, and '-d 0' is interpreted as limitless (no maximum - warning this may be slow). The default per-file depth is now 8000, which matches the value mpileup used to use when processing a single sample. To get the previous default behaviour use the higher of 8000 divided by the number of samples across all input files, or 250. (#859) * Samtools stats new features: - The '--remove-overlaps' option discounts overlapping portions of templates when computing coverage and mapped base counting. (#855) - When a target file is in use, the number of bases inside the target is printed and the percentage of target bases with coverage above a given threshold specified by the '--cov-threshold' option. (#855) - Split base composition and length statistics by first and last reads. (#814, #816) * Samtools faidx new features: - Now takes long options. (#509, thanks to Pierre Lindenbaum) - Now warns about zero-length and truncated sequences due to the requested range being beyond the end of the sequence. (#834) - Gets a new option (--continue) that allows it to carry on when a requested sequence was not in the index. (#834) - It is now possible to supply the list of regions to output in a text file using the new '--region-file' option. (#840) - New '-i' option to make faidx return the reverse complement of the regions requested. (#878) - faidx now works on FASTQ (returning FASTA) and added a new fqidx command to index and return FASTQ. (#852) * Samtools collate now has a fast option '-f' that only operates on primary pairs, dropping secondary and supplementary. It tries to write pairs to the final output file as soon as both reads have been found. (#818) * Samtools bedcov gets a new '-j' option to make it ignore deletions (D) and reference skips (N) when computing coverage. (#843) * Small speed up to samtools coordinate sort, by converting it to use radix sort. (#835, thanks to Zhuravleva Aleksandra) * Samtools idxstats now works on SAM and CRAM files, however this isn't fast due to some information lacking from indices. (#832) * Compression levels may now be specified with the level=N output-fmt-option. E.g. with -O bam,level=3. * Various documentation improvements. * Bug-fixes: - Improved error reporting in several places. (#827, #834, #877, cd7197) - Various test improvements. - Fixed failures in the multi-region iterator (view -M) when regions provided via BED files include overlaps (#819, reported by Dave Larson). - Samtools stats now counts '=' and 'X' CIGAR operators when counting mapped bases. (#855) - Samtools stats has fixes for insert size filtering (-m, -i). (#845; #697 reported by Soumitra Pal) - Samtools stats -F now longer negates an earlier -d option. (#830) - Fix samtools stats crash when using a target region. (#875, reported by John Marshall) - Samtools sort now keeps to a single thread when the -@ option is absent. Previously it would spawn a writer thread, which could cause the CPU usage to go slightly over 100%. (#833, reported by Matthias Bernt) - Fixed samtools phase '-A' option which was incorrectly defined to take a parameter. (#850; #846 reported by Dianne Velasco) - Fixed compilation problems when using C_INCLUDE_PATH. (#870; #817 reported by Robert Boissy) - Fixed --version when built from a Git repository. (#844, thanks to John Marshall) - Use noenhanced mode for title in plot-bamstats. Prevents unwanted interpretation of characters like underscore in gnuplot version 5. (#829, thanks to M. Zapukhlyak) - blast2sam.pl now reports perfect match hits (no indels or mismatches). (#873, thanks to Nils Homer) - Fixed bug in fasta and fastq subcommands where stdout would not be flushed correctly if the -0 option was used. - Fixed invalid memory access in mpileup and depth on alignment records where the sequence is absent. ------------------------------------------------------------------------------ bcftools - changes v1.9 ------------------------------------------------------------------------------ * `annotate` - REF and ALT columns can be now transferred from the annotation file. - fixed bug when setting vector_end values. * `consensus` - new -M option to control output at missing genotypes - variants immediately following insersions should not be skipped. Note however, that the current fix requires normalized VCF and may still falsely skip variants adjacent to multiallelic indels. - bug fixed in -H selection handling * `convert` - the --tsv2vcf option now makes the missing genotypes diploid, "./." instead of "." - the behavior of -i/-e with --gvcf2vcf changed. Previously only sites with FILTER set to "PASS" or "." were expanded and the -i/-e options dropped sites completely. The new behavior is to let the -i/-e options control which records will be expanded. In order to drop records completely, one can stream through "bcftools view" first. * `csq` - since the real consequence of start/splice events are not known, the aminoacid positions at subsequent variants should stay unchanged - add `--force` option to skip malformatted transcripts in GFFs with out-of-phase CDS exons. * `+dosage`: output all alleles and all their dosages at multiallelic sites * `+fixref`: fix serious bug in -m top conversion * `-i/-e` filtering expressions: - add two-tailed binomial test - add functions N_PASS() and F_PASS() - add support for lists of samples in filtering expressions, with many samples it was impractical to list them all on the command line. Samples can be now in a file as, e.g., GT[@samples.txt]="het" - allow multiple perl functions in the expressions and some bug fixes - fix a parsing problem, '@' was not removed from '@filename' expressions * `mpileup`: fixed bug where, if samples were renamed using the `-G` (`--read-groups`) option, some samples could be omitted from the output file. * `norm`: update INFO/END when normalizing indels * `+split`: new -S option to subset samples and to use custom file names instead of the defaults * `+smpl-stats`: new plugin * `+trio-stats`: new plugin * Fixed build problems with non-functional configure script produced on some platforms Rob Davies rm...@sa... The Sanger Institute http://www.sanger.ac.uk/ Hinxton, Cambs., Tel. +44 (1223) 834244 CB10 1SA, U.K. Fax. +44 (1223) 494919 -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |