From: Robert D. <rm...@sa...> - 2021-07-09 11:23:14
|
Samtools (and HTSlib and BCFtools) version 1.13 is now available from GitHub and SourceForge. https://sourceforge.net/projects/samtools/ https://github.com/samtools/htslib/releases/tag/1.13 https://github.com/samtools/samtools/releases/tag/1.13 https://github.com/samtools/bcftools/releases/tag/1.13 The main changes are listed below: ------------------------------------------------------------------------------ htslib - changes v1.13 ------------------------------------------------------------------------------ Features and Updates -------------------- * In case a PG header line has multiple ID tags supplied by other applications, the header API now selects the first one encountered as the identifying tag and issues a warning when detecting subsequent ID tags. (#1256; fixed samtools/samtools#1393) * VCF header reading function (vcf_hdr_read) no longer tries to download a remote index file by default. (#1266; fixes #380) * Support reading and writing FASTQ format in the same way as SAM, BAM or CRAM. Records read from a FASTQ file will be treated as unmapped data. (#1156) * Added GCP requester pays bucket access. Thanks to @indraniel. (#1255) * Made mpileup's overlap removal choose which copy to remove at random instead of always removing the second one. This avoids strand bias in experiments where the +ve and -ve strand reads always appear in the same order. (#1273; fixes samtools/bcftools#1459) * It is now possible to use platform specific BAQ parameters. This also selects long-read parameters for read lengths bigger than 1kb, which helps bcftools mpileup call SNPs on PacBio CCS reads. (#1275) * Improved bcf_remove_allele_set. This fixes a bug that stopped iteration over alleles prematurely, marks removed alleles as 'missing' and does automatic lazy unpacking. (#1288; fixes #1259) * Improved compression metrics for unsorted CRAM files. This improves the choice of codecs when handling unsorted data. (#1291) * Linear index entries for empty intervals are now initialised with the file offset in the next non-empty interval instead of the previous one. This may reduce the amount of data iterators have to discard before reaching the desired region, when the starting location is in a sequence gap. Thanks to @carsonh for reporting the issue. (#1286; fixes #486) * A new hts_bin_level API function has been added, to compute the level of a given bin in the binning index. (#1286) * Related to the above, a new API method, hts_idx_nseq, now returns the total number of contigs from an index. (#1295 and #1299) * Added bracket handling to bcf_hdr_parse_line, for use with ##META lines. Thanks to Alberto Casas Ortiz. (#1240) Build changes ------------- These are compiler, configuration and makefile based changes. * HTSlib now uses libhtscodecs release 1.1.1. * Added a curl/curl.h check to configure and improved INSTALL documentation on build options. Thanks to Melanie Kirsche and John Marshall. (#1265; fixes #1261) * Some fixes to address GCC 11.1 warnings. (#1280, #1284, #1285; fixes #1283) * Supports building HTSlib in a separate directory. Thanks to John Marshall. (#1277; fixes #231) * Supports building HTSlib on MinGW 32-bit environments. Thanks to John Marshall. (#1301) Bug fixes --------- * Fixed hts_itr_query() et al region queries: fixed bug introduced in HTSlib 1.12, which led to iterators producing very few reads for some queries (especially for larger target regions) when unmapped reads were present. HTSlib 1.11 had a related problem in which iterators would omit a few unmapped reads that should have been produced; cf #1142. Thanks to Daniel Cooke for reporting the issue. (#1281; fixes #1279) * Removed compressBound assertions on opening bgzf files. Thanks to Gurt Hulselmans for reporting the issue. (#1258; fixed #1257) * Duplicate sample name error message for a VCF file now only displays the duplicated name rather the entire same name list. (#1262; fixes samtools/bcftools#1451) * Fix to make samtools cat work on CRAMs again. (#1276; fixes samtools/samtools#1420) * Fix for a double memory free in SAM header creation. Thanks to @ihsinme. (#1274) * Prevent assert in bcf_sr_set_regions. Thanks to Dr K D Murray. (#1270) * Fixed crash in knet_open() etc stubs. Thanks to John Marshall. (#1289) * Fixed filter expression "cigar" on unmapped reads. Stop treating an empty CIGAR string as an error. Thanks to Chang Y for reporting the issue. (#1298, fixes samtools/samtools#1445) * Bug fixes in the bundled copy of htscodecs: - Fixed an uninitialized access in the name tokeniser decoder. (samtools/htscodecs#23) - Fixed a bug with name tokeniser and variable number of names per slice, causing it to incorrectly report an error on certain valid inputs. (samtools/htscodecs#24) ------------------------------------------------------------------------------ samtools - changes v1.13 ------------------------------------------------------------------------------ * Fixed samtools view FILE REGION, mpileup -r REGION, coverage -r REGION and other region queries: fixed bug introduced in 1.12, which led to region queries producing very few reads for some queries (especially for larger target regions) when unmapped reads were present. Thanks to @vinimfava (#1451), @JingGuo1997 (#1457) and Ramprasad Neethiraj (#1460) for reporting the respective issues. * Added options to set and clear flags to samtools view. Along with the existing remove aux tags this gives the ability to remove mark duplicate changes (part of #1358) (#1441) * samtools view now has long option equivalents for most of its single-letter options. Thanks to John Marshall. (#1442) * A new tool, samtools import, has been added. It reads one or more FASTQ files and converts them into unmapped SAM, BAM or CRAM. (#1323) * Fixed samtools coverage error message when the target region name is not present in the file header. Thanks to @Lyn16 for reporting it. (#1462; fixes #1461) * Made samtools coverage ASCII mode produce true ASCII output. Previously it would produce UTF-8 characters. (#1423; fixes #1419) * samtools coverage now allows setting the maximum depth, using the -d/--depth option. Also, the default maximum depth has been set to 1000000. (#1415; fixes #1395) * Complete rewrite of samtools depth. This means it is now considerably faster and does not need a depth limit to avoid high memory usage. Results should mostly be the same as the old command with the potential exception of overlap removal. (#1428; fixes #889, helps ameliorate #1411) * samtools flags now accepts any number of command line arguments, allowing multiple SAM flag combinations to be converted at once. Thanks to John Marshall. (#1401, fixes #749) * samtools ampliconclip, ampliconstats and plot-ampliconstats now support inputs that list more than one reference. (#1410 and #1417; fixes #1396 and #1418) * samtools ampliconclip now accepts the --tolerance option, which allows the user to set the number of bases within which a region is matched. The default is 5. (#1456) * Updated the documentation on samtools ampliconclip to be clearer about what it does. From a suggestion by Nathan S Watson-Haigh. (#1448) * Fixed negative depth values in ampliconstats output. (#1400) * samtools addreplacerg now allows for updating (replacing) an existing `@RG` line in the output header, if a new `@RG` line is provided in the command line, via the -r argument. The update still requires the user's approval, which can be given with the new -w option. Thanks to Chuang Yu. (#1404) * Stopped samtools cat from outputting multiple CRAM EOF markers. (#1422) * Three new counts have been added to samtools flagstat: primary, mapped primary and duplicate primary. (#1431; fixes #1382) * samtools merge now accepts a `-o FILE` option specifying the output file, similarly to most other subcommands. The existing way of specifying it (as the first non-option argument, alongside the input file arguments) remains supported. Thanks to David McGaughey and John Marshall. (#1434) * The way samtools merge checks for existing files has been changed so that it does not hang when used on a named pipe. (#1438; fixes #1437) * Updated documentation on mpileup to highlight the fact that the filtering options on FLAGs work with ANY rules. (#1447; fixes #1435) * samtools can now be configured to use a copy of HTSlib that has been set up with separate build and source trees. When this is the case, the `--with-htslib` configure option should be given the location of the HTSlib build tree. (Note that samtools itself does not yet support out-of-tree builds). Thanks to John Marshall. (#1427; companion change to samtools/htslib#1277) ------------------------------------------------------------------------------ bcftools - changes v1.13 ------------------------------------------------------------------------------ This release brings new options and significant changes in BAQ parametrization in `bcftools mpileup`. The previous behaviour can be triggered by providing the `--config 1.12` option. Please see PR #1474 for details. Changes affecting the whole of bcftools, or multiple commands: * Improved build system Changes affecting specific commands: * bcftools annotate: - Fix rare a bug when INFO/END is present, all INFO fields are removed with `bcftools annotate -x INFO` and BCF output is produced. Then the removed INFO/END continues to inform the end coordinate and causes incorrect retrieval of records with the -r option (#1483) - Support for matching annotation line by ID, in addition to CHROM,POS,REF, and ALT (#1461) bcftools annotate -a annots.tab.gz -c CHROM,POS,~ID,REF,ALT,INFO/END input.vcf * bcftools csq: - When GFF and VCF/fasta use a different chromosome naming convention (e.g. chrX vs X), no consequences would be added. Newly the program attempts to detect these differences and remove/add the "chr" prefix to chromosome name to match the GFF and VCF/fasta (#1507) - Parametrize brief-predictions parameter to allow explicit number of amino acids to be printed. Note that the `-b, --brief-predictions` option is being replaced with `-B, --trim-protein-seq INT` * bcftools +fill-tags: - Generalization and better support for custom functions that allow adding new INFO tags based on arbitrary `-i, --include` type of expressions. For example, to calculate a missing INFO/DP annotation from FORMAT/AD, it is possible to use: -t 'DP:1=int(sum(FORMAT/AD))' Here the optional ":1" part specifies that a single value will be added (by default Number=. is used) and the optional int(...) adds an integer value (by default Type=Float is used). - When FORMAT/GT is not present, the INFO/AF tag will be newly calculated from INFO/AC and INFO/AN. * bcftools gtcheck: - Switch between FORMAT/GT or FORMAT/PL when one is (implicitly) requested but only the other is available - Improve diagnostics, printing warnings when a line cannot be matched and the number of lines skipped for various reasons (#1444) - Minor bug fix, with PLs being the default, the `--distinctive-sites` option started to require explicit `--error-probability 0` * bcftools index: - The program now accepts both data file name and the index file name. This adds to user convenience when running index statistics (-n, -s) * bcftools isec: - Always generate sites.txt with isec -p (#1462) * bcftools +mendelian: - Consider only complete trios, do not crash on sample name typos (#1520) * bcftools mpileup: - New `--seed` option for reproducibility of subsampling code in HTSlib - The SCR annotation which shows the number of soft-clipped reads now correctly pools reads together regardless of the variant type. Previously only reads with indels were included at indel sites. - Major revamp of BAQ. Please see https://github.com/samtools/bcftools/pull/1474 for details. The previous behaviour can be triggered by providing the `--config 1.12` option. - Thanks to improvements in HTSlib, the removal of overlapping reads (which can be disabled with the `-x, --ignore-overlaps` options) is not systematically biased anymore (https://github.com/samtools/htslib/pull/1273) - Modified scale of Mann-Whitney U tests. Newly INFO/*Z annotations will be printed, for example MQBZ replaces MQB. * bcftools norm: - Fix Type=Flag output in `norm --atomize` (#1472) - Atomization must not discard ALT=. records - Atomization of AD and QS tags now correctly updates occurrences of duplicate alleles within different haplotypes - Fix a bug in atomization of Number=A,R tags * bcftools reheader: - Add `-T, --temp-prefix` option * bcftools +setGT: - A wider range of genotypes can be set by the plugin by allowing specifying custom genotypes. For example, to force a heterozygous genotype it is now possible to use expressions like: c:'m|M' c:0/1 c:0 * bcftools +split-vep: - New `-u, --allow-undef-tags` option - Better handling of ambiguous keys such as INFO/AF and CSQ/AD. The `-p, --annot-prefix` option is now applied before doing anything else which allows its use with `-f, --format` and `-c, --columns` options. - Some consequence field names may not constitute a valid tag name, such as "pos(1-based)". Newly field names are trimmed to exclude brackets. * bcftools +tag2tag: - New --QR-QA-to-QS option to convert annotations generated by Freebays to QS used by BCFtools * bcftools +trio-dnm: - Add support for sites with more than four alleles. Note that only the four most frequent alleles are considered, the model remains unchanged. Previously such sites were skipped. - New --use-NAIVE option for a naive DNM calling based solely on FORMAT/GT and expected Mendelian inheritance. This option is suitable for prefiltering. - Fix behaviour to match the documentation, the `--dnm-tag DNG` option now correctly outputs log scaled values by default, not phred scaled. - Fix bug in VAF calculation, homozygous de novo variants were incorrectly reported as having VAF=50% - Fix arithmetic underflow which could lead to imprecise scores and improve sensitivity in high coverage regions - Allow combining --pn and --pns to set the noise trehsholds independently -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |