You can subscribe to this list here.
2009 |
Jan
|
Feb
(3) |
Mar
(45) |
Apr
(67) |
May
(23) |
Jun
(42) |
Jul
(118) |
Aug
(100) |
Sep
(77) |
Oct
(95) |
Nov
(101) |
Dec
(59) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 |
Jan
(52) |
Feb
(42) |
Mar
(69) |
Apr
(38) |
May
(93) |
Jun
(155) |
Jul
(216) |
Aug
(81) |
Sep
(49) |
Oct
(57) |
Nov
(102) |
Dec
(55) |
2011 |
Jan
(23) |
Feb
(35) |
Mar
(71) |
Apr
(103) |
May
(23) |
Jun
(31) |
Jul
(73) |
Aug
(119) |
Sep
(180) |
Oct
(119) |
Nov
(68) |
Dec
(92) |
2012 |
Jan
(30) |
Feb
(45) |
Mar
(123) |
Apr
(46) |
May
(42) |
Jun
(46) |
Jul
(21) |
Aug
(31) |
Sep
(56) |
Oct
(40) |
Nov
(6) |
Dec
(30) |
2013 |
Jan
(13) |
Feb
(24) |
Mar
(39) |
Apr
(57) |
May
(57) |
Jun
(22) |
Jul
(23) |
Aug
(14) |
Sep
(16) |
Oct
(6) |
Nov
(9) |
Dec
(22) |
2014 |
Jan
(26) |
Feb
(21) |
Mar
(28) |
Apr
(27) |
May
(27) |
Jun
(26) |
Jul
(11) |
Aug
(11) |
Sep
(20) |
Oct
(14) |
Nov
(33) |
Dec
(24) |
2015 |
Jan
(18) |
Feb
(19) |
Mar
(19) |
Apr
(3) |
May
(18) |
Jun
(15) |
Jul
(34) |
Aug
(21) |
Sep
(15) |
Oct
(3) |
Nov
(6) |
Dec
(1) |
2016 |
Jan
(20) |
Feb
(4) |
Mar
(1) |
Apr
(1) |
May
|
Jun
(4) |
Jul
(2) |
Aug
|
Sep
(11) |
Oct
|
Nov
(1) |
Dec
|
2017 |
Jan
(4) |
Feb
(5) |
Mar
(1) |
Apr
(13) |
May
(4) |
Jun
(7) |
Jul
(1) |
Aug
(3) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
2018 |
Jan
(2) |
Feb
(1) |
Mar
(6) |
Apr
(5) |
May
(8) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(3) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2020 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(1) |
Jul
(1) |
Aug
|
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
2022 |
Jan
|
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
2023 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Robert D. <rm...@sa...> - 2024-09-12 15:59:44
|
Samtools (and HTSlib and BCFtools) version 1.21 is now available from GitHub and SourceForge. https://github.com/samtools/htslib/releases/tag/1.21 https://github.com/samtools/samtools/releases/tag/1.21 https://github.com/samtools/bcftools/releases/tag/1.21 https://sourceforge.net/projects/samtools/ The main changes are listed below: ------------------------------------------------------------------------------ htslib - changes v1.21 ------------------------------------------------------------------------------ The primary user-visible changes in this release are updates to the annot-tsv tool and some speed improvements. Full details of other changes and bugs fixed are below. Notice: this is the last SAMtools / HTSlib release where CRAM 3.0 will be the default CRAM version. From the next we will change to CRAM 3.1 unless the version is explicitly specified, for example using "samtools view -O cram,version=3.0". Updates ------- * Extend annot-tsv with several new command line options. --delim permits use of other delimiters. --headers for selection of other header formats. --no-header-idx to suppress column index numbers in header. Also removed -h as it is now short for --headers. Note --help still works. (PR #1779) * Allow annot-tsv -a to rename annotations. (PR #1709) * Extend annot-tsv --overlap to be able to specify the overlap fraction separately for source and target. (PR #1811) * Added new APIs to facilitate low-level CRAM container manipulations, used by the new "samtools cat" region filtering code. Functions are: cram_container_get_coords() cram_filter_container() cram_index_extents() cram_container_num2offset() cram_container_offset2num() cram_num_containers() cram_num_containers_between() Also improved cram_index_query() to cope with HTS_IDX_NOCOOR regions. (PR #1771) * Bgzip now retains file modification and access times when compressing and decompressing. (PR #1727, fixes #1718. Requested by Gert Hulselmans.) * Use FNV1a for string hashing in khash. The old algorithm was particularly weak with base-64 style strings and lead to a large number of collisions. (PR #1806. Fixes samtools/samtools#2066, reported by Hans-Joachim Ruscheweyh) * Improve the speed of the nibble2base() function on Intel (PR #1667, PR #1764, PR #1786, PR #1802, thanks to Ruben Vorderman) and ARM (PR #1795, thanks to John Marshall). * bgzf_getline() will now warn if it encounters UTF-16 data. (PR #1487, thanks to John Marshall) * Speed up bgzf_read(). While this does not reduce CPU significantly, it does increase the maximum parallelism available permitting 10-15% faster decoding. (PR #1772, PR #1800, Issue #1798) * Speed up faidx by use of better isgraph methods (PR #1797) and whole-line reading (PR #1799, thanks to John Marshall). * Speed up kputll() function, speeding up BAM -> SAM conversion by about 5% and also samtools depth. (PR #1805) * Added more example code, covering fasta/fastq indexing, tabix indexing and use of the thread pool. (PR #1666) Build Changes ------------- * Code warning fixes for pedantic compilers (PR #1777) and avoid some undefined behaviour (PR #1810, PR #1816, PR #1828). * Windows based CI has been migrated from AppVeyor to GitHub Actions. (PR #1796, PR #1803, PR #1808) * Miscellaneous minor build infrastructure and code fixes. (PR #1807, PR #1829, both thanks to John Marshall) * Updated htscodecs submodule to version 1.6.1 (PR #1828) * Fixed an awk script in the Makefile that only worked with gawk. (PR #1831) Bug fixes --------- * Fix small OSS-Fuzz reported issues with CRAM encoding and long CIGARS and/or illegal positions. (PR #1775, PR #1801, PR #1817) * Fix issues with on-the-fly indexing of VCF/BCF (bcftools --write-index) when not using multiple threads. (PR #1837. Fixes samtools/bcftools#2267, reported by Giulio Genovese) * Stricter limits on POS / MPOS / TLEN in sam_parse1(). This fixes a signed overflow reported by OSS-Fuzz and should help prevent other as-yet undetected bugs. (PR #1812) * Check that the underlying file open worked for preload: URLs. Fixes a NULL pointer dereference reported by OSS-Fuzz. (PR #1821) * Fix an infinite loop in hts_itr_query() when given extremely large positions which cause integer overflow. Also adds hts_bin_maxpos() and hts_idx_maxpos() functions. (PR #1774, thanks to John Marshall and reported by Jesus Alberto Munoz Mesa) * Fix an out of bounds read in hts_itr_multi_next() when switching chromosomes. This bug is present in releases 1.11 to 1.20. (PR #1788. Fixes samtools/samtools#2063, reported by acorvelo) * Work around parsing problems with colons in CHROM names. Fixes samtools/bcftools#2139. (PR #1781, John Marshall / James Bonfield) * Correct the CPU detection for Mac OS X 10.7. cpuid is used by htscodecs (see samtools/htscodecs#116), and the corresponding changes in htslib are PR #1785. Reported by Ryan Carsten Schmidt. * Make BAM zero-length intervals work the same as CRAM; permitted and returning overlapping records. (PR #1787. Fixes samtools/samtools#2060, reported by acorvelo) * Replace assert() with abort() in BCF synced reader. This is not an ideal solution, but it gives consistent behaviour when compiling with or without NDEBUG. (PR #1791, thanks to Martin Pollard) * Fixed failure to change the write block size on compressed SAM or VCF files due to an internal type confusion. (PR #1826) * Fixed an out-of-bounds read in cram_codec_iter_next() (PR #1832) ------------------------------------------------------------------------------ samtools - changes v1.21 ------------------------------------------------------------------------------ Notice: * This is the last SAMtools / HTSlib release where CRAM 3.0 will be the default CRAM version. From the next we will change to CRAM 3.1 unless the version is explicitly specified, for example using "samtools view -O cram,version=3.0". New work and changes: * `samtools reset` now removes a set of predefined auxtags, as these tags are no longer valid after the reset operation. This behaviour can be overridden if desired. (PR #2034, fixes #2011. Reported by Felix Lenner) * `samtools reset` now also removes duplicate flags. (PR #2047. Reported by Kevin Lewis) * Region and section/part filtering added to CRAM `samtools cat`. Region filtering permits `samtools cat` to produce new CRAMs that only cover a specified region. (PR #2035) * Added a report of the number of alignments for each primer to `samtools ampliconclip`. (PR #2039, PR #2101, feature request #2033. Thanks to Brad Langhorst) * Make `ampliconclip` primer counts output deterministic. (PR #2081) * `samtools fixmate` no longer removes the PAIRED flag from reads that have no mate. This is done on the understanding that the PAIRED flag is a sequencing technology indicator not a feature of alignment. This is a change to previous `fixmate` behaviour. (PR #2056, fixes #2052. Reported by John Wiedenhoeft) * Added bgzf compressed FASTA output to `samtools faidx`. (PR #2067, fixes #2055. Requested by Filipe G Vieira) * Optimise `samtools depth` histogram incrementing code. (PR #2078) * In `samtools merge` zero pad unique suffix IDs. (PR #2087, fixes #2086. Thanks to Chris Wright) * `samtools idxstats` now accepts the `-X` option, making it easier to specify the location of the index file. (PR #2093, feature request #2071. Requested by Samuel Chen) * Improved documentation for the mpileup `--adjust-MQ` option. (PR #2098. Requested by Georg Langebrake) Bug fixes: * Avoid `tview` buffer overflow for positions with >= 14 digits. (PR #2032. Thanks to John Marshall. Reported on bioconda/bioconda-recipes#47137 by jmunoz94) * Added file name and error message to 'error closing output file' error in `samtools sort`. (PR #2050, fixes #2049. Thanks to Joshua C Randall). * Fixed hard clip trimming issue in `ampliconclip` where right-hand side qualities were being removed from left-hand side trims. (PR #2053, fixes #2048. Reported by Duda5) * Fixed a bug in `samtools merge --template-coordinate` where the wrong heap was being tested. (PR #2062. Thanks to Nils Homer. Reported on ng-core/fastquorum#52 by David Mas-Ponte) * Do not look at chr "*" for unmapped-placed reads with `samtools view --fetch-pairs`. This was causing a significant slowdown when `--fetch-pairs` was being used. (PR #2070, fixes #2059. Reported by acorvelo) * Fixed bug which could cause `samtools view -L` to give incomplete output when the BED file contained nested target locations. (PR #2107, fixes #2104. Reported by geertvandeweyer) * Enable `samtools coverage` to handle alignments that do not have quality score data. This was causing memory access problems. (PR #2083, fixes #2076. Reported by Matthew Colpus) * Fix undefined behaviour in `samtools fastq` with empty QUAL. (PR #2084) * In `plot-bamstats` fixed read-length plot for data with limited variations in length. Lack of data was causing gnuplot problems. (PR #2085, fixes #2068. Reported by mariyeta) * Fixed an accidental fall-through that caused `samtools split -p` to also enable `--no-PG`. (PR #2101) * Fixed an overflow that caused `samtools consensus -m simple` to give incorrect output when the input coverage exceeded several million reads deep. (PR #2099, fixes #2095. Reported by Dylan Lawrence) Non user-visible changes and build improvements: * Work around address sanitizer going missing from the Cirrus CI ubuntu clang compiler by moving the address sanitizer build to gcc. Fix warnings from the new clang compiler. (PR #2043) * Windows based CI has been migrated from AppVeyor to GitHub Actions. (PR #2072, PR #2108) * Turn on more warning options in Cirrus-CI builds, ensure everything builds with `-Werror`, and add undefined behaviour checks to the address sanitizer test. (PR #2101, PR #2103, PR #2109) * Tidy up Makefile dependencies and untracked test files. (PR #2106. Thanks to John Marshall) ------------------------------------------------------------------------------ bcftools - changes v1.21 ------------------------------------------------------------------------------ Changes affecting the whole of bcftools, or multiple commands: * Support multiple semicolon-separated strings when filtering by ID using -i/-e (#2190). For example, `-i 'ID="rs123"'` now correctly matches `rs123;rs456` * The filtering expression ILEN can be positive (insertion), negative (deletion), zero (balanced substitutions), or set to missing value (symbolic alleles). * bcftools query * bcftools +split-vep - The columns indices printed by default with `-H` (e.g., "#[1]CHROM") can be now suppressed by giving the option twice `-HH` (#2152) Changes affecting specific commands: * bcftools annotate - Support dynamic variables read from a tab-delimited annotation file (#2151). For example, in the two cases below the field 'STR' from the -a file is required to match the INFO/TAG in VCF. In the first example the alleles REF,ALT must match, in the second example they are ignored. The option -k is required to output also records that were not annotated: bcftools annotate -a ann.tsv.gz \ -c CHROM,POS,REF,ALT,SCORE,~STR -i'TAG={STR}' -k in.vcf bcftools annotate -a ann.tsv.gz \ -c CHROM,POS,-,-,SCORE,~STR -i'TAG={STR}' -k in.vcf - When adding Type=String annotations from a tab-delimited file, encode characters with special meaning using percent encoding (';', '=' in INFO and ':' in FORMAT) (#2202) * bcftools consensus - Allow to apply a reference allele which overlaps a previous deletion, there is no need to complain about overlapping alleles in such case - Fix a bug which required `-s -` to be present even when there were no samples in the VCF (#2260) * bcftools csq - Fix a rare bug where indel combined with a substitution ending at exon boundary is incorrectly predicted to have 'inframe' rather than 'frameshift' consequence (#2212) * bcftools gtcheck - Fix a segfault with --no-HWE-prob. The bug was introduced with the output format change in 1.19 which replaced the DC section with DCv2 (#2180) - The number of matching genotypes in the DCv2 output was not calculated correctly with non-zero `-E, --error-probability`. Consequently, also the average HWE score was incorrect. The main output, the discordance score, was not affected by the bug * bcftools +mendelian2 - Include the number of good cases where at least one of the trio genotypes has an alternate allele (#2204) - Fix the error message which would report the wrong sample when non-existent sample is given. Note that bug only affected the error message, the program otherwise assigns the family members correctly (#2242) * bcftools merge - Fix a severe bug in merging of FORMAT fields with Number=R and Number=A values. For example, rows with high-coverage FORMAT/AD values (bigger or equal to 128) could have been assigned to incorrect samples. The bug was introduced in version 1.19. For details see #2244. * bcftools mpileup - Return non-zero error code when the input BAM/CRAM file is truncated (#2177) - Add FORMAT/AD annotation by default, disable with `-a -AD` * bcftools norm - Support realignment of symbolic <DUP.*> alleles, similarly to <DEL.*> added previously (#1919,#2145) - Fix in reporting reference allele genotypes with `--multi-overlaps .` (#2160) - Support of duplicate removal of symbolic alleles of the same type but different SVLEN (#2182) - New `-S, --sort` switch to optionally sort output records by allele (#1484) - Add the `-i/-e` filtering options to select records for normalization. Note duplicate removal ignores this option. - Fix a bug where `--atomize` would not fill GT alleles for atomized SNVs followed by an indel (#2239) * bcftools +remove-overlaps - Revamp the program to allow greater flexibility, with the following new options: -M, --mark-tag TAG Mark -m sites with INFO/TAG -m, --mark EXPR Mark (if also -M is present) or remove sites [overlap] dup .. all overlapping sites overlap .. overlapping sites min(QUAL) .. mark sites with lowest QUAL until overlaps are resolved --missing EXPR Value to use for missing tags with -m 'min(QUAL)' 0 .. the default DP .. heuristics, scale maximum QUAL value proportionally to INFO/DP --reverse Apply the reverse logic, for example preserve duplicates instead of removing -O, --output-type t t: plain list of sites (chr,pos), tz: compressed list * bcftools +tag2tag - The conversions --LXX-to-XX, --XX-to-LXX were working but specific cases such as --LAD-to-AD were not. - Print more informative error message when source tag type violiates VCF specification * bcftools +trio-dnm2 - Better handling of the --strictly-novel functionality, especically with respect to chrX inheritance ---------------------------------------------------------------------- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA. |
From: Robert D. <rm...@sa...> - 2024-04-15 17:19:09
|
Samtools (and HTSlib and BCFtools) version 1.20 is now available from GitHub and SourceForge. https://github.com/samtools/htslib/releases/tag/1.20 https://github.com/samtools/samtools/releases/tag/1.20 https://github.com/samtools/bcftools/releases/tag/1.20 https://sourceforge.net/projects/samtools/ The main changes are listed below: ------------------------------------------------------------------------------ htslib - changes v1.20 ------------------------------------------------------------------------------ Updates ------- * When working on named files, bgzip now sets the modified and access times of the output files it makes to match those of the corresponding input. (PR #1727, feature request #1718. Requested by Gert Hulselmans) * It's now possible to use a -o option to specify the output file name in bgzip. (PR #1747, feature request #1726. Requested by Gert Hulselmans) * Improved error faidx error messages. (PR #1743, thanks to Nick Moore) * Faster reading of SAM array (type "B") tags. These often turn up in ONT and PacBio data. (PR #1741) * Improved validity checking of base modification tags. (PR #1749) * mpileup overlap removal now works where one read has a deletion. (PR #1751, fixes samtools/samtools#1992. Reported by Long Tian) * The S3 plugin can now find buckets via S3 access point aliases. (PR #1756, thanks to Matt Pawelczyk; fixes samtools/samtools#1984. Reported by Albert Li) * Added a --threads option (and -@ short option) to tabix. (PR #1755, feature request #1735. Requested by Dan Bolser) * tabix can now index Graph Alignment Format (GAF) files. (See https://github.com/lh3/gfatools/blob/master/doc/rGFA.md) (PR #1763, thanks to Adam Novak) Bug fixes --------- * Security fix: Prevent possible heap overflow in cram_encode_aux() on bad RG:Z tags. (PR #1737) * Security fix: Prevent attempts to call a NULL pointer if certain URL schemes are used in CRAM @SQ UR: tags. (PR #1757) * Security fix: Fixed a bug where following certain AWS S3 redirects could downgrade the connection from TLS (i.e. https://) to unencrypted http://. This could happen when using path-based URLs and AWS_DEFAULT_REGION was set to a region other that the one where the data was stored. (PR #1762, fixes #1760. Reported by andaca) * Fixed arithmetic overflow when loading very long references for CRAM. (PR #1738, fixes #1738. Reported by Shane McCarthy) * Fixed faidx and CRAM reference look-ups on compressed fasta where the .fai index file was present, but the .gzi index of compressed offsets was not. (PR #1745, fixes #1744. Reported by Theodore Li) * Fixed BCF indexing on-the-fly bug which produced invalid indexes when using multiple compression threads. (PR #1742, fixes #1740. Reported by graphenn) * Ensure that pileup destructors are called by bam_plp_destroy(), to prevent memory leaks. (PR #1749, PR #1754) * Ensure on-the-fly index timestamps are always older than the data file. Previously the files could be closed out of order, leading to warnings being printed when using the index. (PR #1753, fixes #1732. Reported by Gert Hulselmans) * To prevent data corruption when reading (strictly invalid) VCF files with duplicated FORMAT tags, all but the first copy of the data associated with the tag are now dropped with a warning. (PR #1752, PR #1761, fixes #1733. Reported by anthakki) * Fixed a bug introduced in release 1.19 (PR #1689) which broke variant record data if it tried to remove an over-long tag. (PR #1752, PR #1761) * Changed error to warning when complaining about use of the CG tag in SAM or CRAM files. (PR #1758, fixes samtools/samtools#2002) ------------------------------------------------------------------------------ samtools - changes v1.20 ------------------------------------------------------------------------------ * Added a `--max-depth` option to `bedcov`, for more control over the depth limit used when calculating the pileup. Previously this limit was set at 64000; now it is set to over 2 billion, so effectively all bases will be counted. (PR #1970, fixes #1950. Reported by ellisjj) * Added `mpileup --output-extra RLEN` to display the unclipped read length. (PR #1971, feature request #1959. Requested by Feng Tian) * Improved checking of symbolic flag names (e.g. UNMAP) passed to samtools. (PR #1981, fixes #1977. Reported by Ilya Shlyakhter) * The `samtools consensus --min-depth` option now works for the Bayesian mode as well as the simple one. (PR #1989, feature request #1982. Requested by Gautier Richard) * It's now possible to use the `samtools fastq` `-d tag:val` option multiple times, allowing matches on more than one tag/value. It also gets a `-D` option which allows the values to be listed in a file. (PR #1993, feature request #1958. Requested by Tristan Lefebure) * Added `samtools fixmate` `-M` option to sanity check base modification (`ML`, `MM`, `MN`) tags, and where necessary adjust modification data on hard-clipped records. (PR #1990) * Made `mpileup` run faster. (PR #1995) * `samtools import` now adds a `@PG` header to the files it makes. As with other sub-commands, this can be disabled by using `--no-PG`. (PR #2008. Requested by Steven Leonard) * The `samtools split` `-d` option to split by tag value now works on tags with integer values. (PR #2005, feature request #1956. Requested by Alex Leonard) * Adjusted `samtools sort -n` (by name) so that primary reads are always sorted before secondary / supplementary. (PR #2012, feature request #2010. Requested by Stijn van Dongen) * Added `samtools bedcov` `-H` option to print column headers in the output. (PR #2025. Thanks to Dr. K. D. Murray) Documentation: * Added a note that BAQ is applied before filtering and overlap removal during mpileup processing. (PR #1988, fixes #1985. Reported by Joseph Galasso) * Added 3.1 to the list of supported CRAM versions in the samtools manual page. (PR #2009. Thanks to Andrew Thrasher) * Made assorted improvements to ampliconclip, flagstat and markdup manual pages. (PR #2014) Bug Fixes: * Security fix: Fixed double free that could occur if bed file indexing failed due to running out of memory. This bug first appeared in version 1.19.1. (PR #2026) * Corrected error message printed when faidx fails to load the fai index. (PR #1987. Thanks to Nick Moore) * Fixed bug introduced in release 1.4 that caused incorrect reference bases to be printed by `samtools mpileup -a -f ref.fa` in the zero-depth regions at the end of each reference. (PR #2019, fixes #2018. Reported by Joe Georgeson) * Fixed a samtools view usage crash on MinGW when given invalid options. (PR #2030, fixes #2029. Reported by Divon Lan) Non user-visible changes and build improvements: * Added tests to ensure that CRAM compression is working properly. (PR #1969, part of fix for #1968. Reported by Clockris) ------------------------------------------------------------------------------ bcftools - changes v1.20 ------------------------------------------------------------------------------ Changes affecting the whole of bcftools, or multiple commands: * Add short option -W for --write-index. The option now accepts an optional parameter which allows to choose between TBI and CSI index format. Changes affecting specific commands: * bcftools consensus - Add new --regions-overlap option which allows to take into account overlapping deletions that start out of the fasta file target region. * bcftools isec - Add new option `-l, --file-list` to read the list of file names from a file * bcftools merge - Add new option `--force-single` to support single-file edge case (#2100) * bcftools mpileup - Add new option --indels-cns for an alternative indel calling model, which should increase the speed on long read data (thanks to using edlib) and the precision (thanks to a number of heuristics). * bcftools norm - Change the order of atomization and multiallelic splitting (when both -a,-m are given) from "atomize first, then split" to "split first, then atomize". This usually results in a simpler VCF representation. The previous behaviour can be achieved by explicitly streaming the output of the --atomize command into the --multiallelics splitting command. - Fix Type=String multiallelic splitting for Number=A,R,G tags with incorrect number of values. - Merging into multiallelic sites with `bcftools norm -m +indels` did not work. This is now fixed and the merging is now more strict about variant types, for example complex events, such as AC>TGA, are not considered as indels anymore (#2084) * bcftools reheader - Allow reading the input file from a stream with --fai (#2088) * bcftools +setGT - Support for custom genotypes based on the allele with higher depth, such as `--new-gt c:0/X` custom genotypes (#2065) * bcftools +split-vep - When only one of the tags is present, automatically choose INFO/BCSQ (the default tag name produced by `bcftools csq`) or INFO/CSQ (produced by VEP). When both tags are present, use the default INFO/CSQ. - Transcript selection by MANE, PICK, and user-defined transcripts, for example --select CANONICAL=YES --select MANE_SELECT!="" --select PolyPhen~probably_damaging - Select all matching transcripts via --select, not just one - Change automatic type parsing of VEP fields DNA_position, CDS_position, and Protein_position from Integer to String, as it can be of the form "8586-8599/9231". The type Integer can be still enforced with `-c cDNA_position:int,CDS_position:int,Protein_position:int`. - Recognize `-c field:str`, not just `-c field:string`, as advertised in the usage page - Fix a bug which made filtering expression containing missing values crash (#2098) * bcftools stats - When GT is missing but AD is present, the program determines the alternate allele from AD. However, if the AD tag has incorrect number of values, the program would exit with an error printing "Requested allele outside valid range". This is now fixed by taking into account the actual number of ALT alleles. * bcftools +tag2tag - Support for conversion from tags using localized alleles (e.g. LPL, LAD) to the family of standard tags (PL, AD) * bcftools +trio-dnm2 - Extend --strictly-novel to exclude cases where the non-Mendelian allele is the reference allele. The change is motivated by the observation that this class of variants is enriched for errors (especially for indels), and better corresponds with the option name. -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA. |
From: Robert D. <rm...@sa...> - 2024-01-24 12:43:15
|
Samtools release 1.19.2 is now available from GitHub and SourceForge. It fixes an error in 1.19.1 that broke filtering on unordered BED files. https://github.com/samtools/samtools/releases/tag/1.19.2 https://sourceforge.net/projects/samtools/ ------------------------------------------------------------------------------ samtools - changes v1.19.2 ------------------------------------------------------------------------------ Bug Fixes: * Fixed a regression in 1.19.1 that broke BED filtering for inputs where the region start positions for the same reference were not sorted in ascending order. (PR #1975, fixes #1974. Reported by Anže Starič) -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA. |
From: Robert D. <rm...@sa...> - 2024-01-22 12:10:43
|
Samtools and HTSlib release 1.19.1 are now available from GitHub and SourceForge. This release fixes a regression in version 1.19 that caused written CRAM files to be much bigger than necessary. It also fixes a number of other bugs listed below, including one that could cause crashes or incorrect results when filtering by regions in a BED file. https://github.com/samtools/htslib/releases/tag/1.19.1 https://github.com/samtools/samtools/releases/tag/1.19.1 https://sourceforge.net/projects/samtools/ ------------------------------------------------------------------------------ htslib - changes v1.19.1 ------------------------------------------------------------------------------ * Fixed a regression in release 1.19 that caused all aux records to be stored uncompressed in CRAM files. The resulting files were correctly formatted, but bigger than they needed to be. (PR#1729, fixes samtools#1968. Reported by Clockris) * Fixed possible out-of-bounds reads due to an incorrect check on B tag lengths in cram_encode_aux(). (PR#1725) * Fixed an incorrect check on tag length which could fail to catch a two byte out-of-bounds read in bam_get_aux(). (PR#1728) * Made errors reported by hts_open_format() less confusing when it can't open the reference file. (PR#1724, fixes #1723. Reported by Alex Leonard) * Made hts_close() fail more gracefully if it's passed a NULL pointer (PR#1724) ------------------------------------------------------------------------------ samtools - changes v1.19.1 ------------------------------------------------------------------------------ Bug Fixes: * Fixed a possible array bounds violation when looking up regions in a BED file (e.g. using `samtools view -L`). This could lead to crashes or the return of incomplete results if the BED file contained a large number of entries all referencing low positions on a chromosome. (PR #1962, fixes #1961. Reported by geertvandeweyer) * Fixed a crash in samtools stats that occurred when trying to clean up after it was unable to open a CRAM reference file. (PR #1957, fixes crash reported in samtools/htslib#1723. Reported by Alex Leonard) Documentation: * Fixed inverted logic in the `samtools consensus --show-del` manual page description. (PR #1955, fixes #1951. Reported by Mikhail Schelkunov) * Added a description of the MPC section to the `samtools stats` manual page. (PR #1963, fixes #1954. Reported by litun-fkby) -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA. |
From: Robert D. <rm...@sa...> - 2023-12-12 16:23:04
|
Samtools (and HTSlib and BCFtools) version 1.19 is now available from GitHub and SourceForge. https://github.com/samtools/htslib/releases/tag/1.19 https://github.com/samtools/samtools/releases/tag/1.19 https://github.com/samtools/bcftools/releases/tag/1.19 https://sourceforge.net/projects/samtools/ The main changes are listed below: ------------------------------------------------------------------------------ htslib - changes v1.19 ------------------------------------------------------------------------------ Updates ------- * A temporary work-around has been put in the VCF parser so that it is less likely to fail on rows with a large number of ALT alleles, where Number=G tags like PL can expand beyond the 2Gb limit enforced by HTSlib. For now, where this happens the offending tag will be dropped so the data can be processed, albeit without the likelihood data. In future work, the library will instead convert such tags into their local alternatives (see https://github.com/samtools/hts-specs/pull/434). * New program. Adds annot-tsv which annotates regions in a destination file with texts from overlapping regions in a source file. (PR#1619) * Change bam_parse_cigar() so that it can modify existing BAM records. This makes more useful as public API. Previously it could only handle partially formed BAM records. (PR#1651, fixes #1650. Reported by Oleksii Nikolaienko) * Add "uncompressed" to hts_format_description() where appropriate. This adds an "uncompressed" description to uncompressed files that would normally be compressed, such as BAM and BCF. (PR#1656, in relation to samtools#1884. Thanks to John Marshall) * Speed up to the VCF parser and writer. (PR#1644 and PR#1663) * Add an hclen (hard clip length) SAM filter function. (PR#1660, with reference to samtools#813) * Avoid really closing stdin/stdout in hclose()/hts_close()/et al. See discussion in PR for details. (PR#1665. Thanks to John Marshall) * Add support to handle multiple files in bgzip. (PR#1658, fixes #1642. Requested by bw2) * Enable auto-vectorisation in CRAM 3.1 codecs. Speeds decoding on some sequencing platform data. (PR#1669) * Speed up removal of lines in large headers. (PR#1662, fixes #1460. Reported by Anže Starič) * Apply seqtk PR to improve kseq.h parsing performance. Port of Fabian Klötzl's (kloetzl) lh3/seqtk#123 and attractivechaos/klib#173 to HTSlib. (PR#1674. Thanks to John Marshall) Build changes ------------- * Updated htscodecs submodule to 1.6.0. (PR#1685, PR#1717, PR#1719) * Apply the packed attribute to uint*_u types for Clang to prevent -fsanitize=alignment failures. (PR#1667. Thanks to Fangrui Song) * Fuzz testing improvements. (PR#1664) * Add C++ casts for external headers in klist.h and kseq.h. (PR#1683. See also PR#1674 and PR#1682) * Add test case compiling the public headers as C++. (PR#1682. Thanks to John Marshall) * Enable optimisation level -O3 for SAM QUAL+33 formatting. (PR#1679) * Make compiler flag detection work with zig cc. (PR#1687) * Fix unused value warnings when built with NDEBUG. (PR#1688) * Remove some disused Makefile variables, fix typos and a warning. Improve bam_parse_basemod() documentation. (PR#1705, Thanks to John Marshall) Bug fixes --------- * Fail bgzf_useek() when offset is above block limits. (PR#1668) * Fix multi-threaded on-the-fly indexing problems. (PR#1672, fixes samtools#1861 and bcftools#1985. Reported by Mark Ebbert and lacek) * Fix hfile_libcurl small seek bug. (PR#1676, fixes samtools#1918. Also may fix #1037, #1625 and samtools#1622. Reported by Alex Reynolds, Mark Walker, Arthur Gilly and skatragadda-nygc. Thanks to John Marshall) * Fix a minor memory leak in malformed CRAM EXTERNAL blocks. [fuzz] (PR#1671) * Fix a cram decode hang from block_resize(). (PR#1680. Reported by Sebastian Deorowicz) * Cram fuzzing improvements. Fixes a number of cram errors. (PR#1701, fixes #1691, #1692, #1693, #1696, #1697, #1698, #1699 and #1700. Thanks to Octavio Galland for finding and reporting all these) * Fix crypt4gh redirection. (PR#1675, fixes grbot/crypt4gh-tutorial#2. Reported by hth4) * Fix PG header linking when records make a loop. (PR#1702, fixes #1694. Reported by Octavio Galland) * Prevent issues with no-stored-sequence records in CRAM files, by ensuring they are accounted for properly in block size calculations, and by limiting the maximum query length in the CIGAR data. Originally seen as an overflow by OSS-Fuzz / UBSAN, it turned out this could lead to excessive time and memory use by HTSlib, and could result in it writing out unreadable CRAM files. (PR#1710) * Fix some illegal shifts and integer overflows found by OSS-Fuzz / UBSAN. (PR#1707, PR#1712, PR#1713) ------------------------------------------------------------------------------ samtools - changes v1.19 ------------------------------------------------------------------------------ New work and changes: * Samtools coverage: add a new --plot-depth option to draw depth (of coverage) rather than the percentage of bases covered. (PR #1910. Thanks to Pierre Lindenbaum) * Samtools merge / sort: add a lexicographical name-sort option via the -N option. The "natural" alpha-numeric sort is still available via -n. (PR #1900, fixes #1500. Reported by Steve Huang) * Samtools view: add -N ^NAME_FILE and -R ^RG_FILE options. The standard -N and -R options only output reads matching a specified list of names or read-groups. With a caret (^) prefix these may be negated to only output read not matching the specified files. (PR #1896, fixes #1895. Suggested by Feng Tian) * Cope with Htslib's change to no longer close stdout on hts_close. Htslib companion PR is samtools/htslib#1665. (PR #1909. Thanks to John Marshall) * Plot-bamstats: add a new plot of the read lengths ("RL") from samtools stats output. (PR #1922, fixes #1824. Thanks to @erboone, suggested by Martin Pollard) * Samtools split: support splitting files based on the contents of auxiliary tags. Also adds a -M option to limit the number of files split can make, to avoid accidental resource over-allocation, and fixes some issues with --write-index. (PR #1222, PR #1933, fixes #1758. Thanks to Valeriu Ohan, suggested by Scott Norton) Bug Fixes: * Samtools stats: empty barcode tags are now treated as having no barcode. (PR #1929, fixes #1926. Reported by Jukka Matilainen) * Samtools cat: add support for non-seekable streams. The file format detection blocked pipes from working before, but now files may be non-seekable such as stdin or a pipe. (PR #1930, fixes #1731. Reported by Julian Hess) * Samtools mpileup -aa (absolutely all positions) now produces an output even when given an empty input file. (PR #1939. Reported by Chang Y) * Samtools markdup: speed up optical duplicate tagging on regions with very deep data. (PR #1952) Documentation: * Samtools mpileup: add more usage examples to the man page. (PR #1913, fixes #1801) * Samtools fastq: explicitly document the order that filters apply. (PR #1907) * Samtools merge: fix example output to use an uppercase RG PL field. (PR #1917. Thanks to John Marshall. Reported by Michael Macias) * Add hclen SAM filter documentation. (PR #1902. See also samtools/htslib#1660) * Samtools consensus: remove the -5 option from documentation. This option was renamed before the consensus subcommand was merged, but accidentally left in the man page. (PR #1924) ------------------------------------------------------------------------------ bcftools - changes v1.19 ------------------------------------------------------------------------------ Changes affecting the whole of bcftools, or multiple commands: * Filtering expressions can be given a file with list of strings to match, this was previously possible only for the ID column. For example ID=@file .. selects lines with ID present in the file INFO/TAG=@file.txt .. selects lines where TAG has a string value listed in the file INFO/TAG!=@file.txt .. TAG must not have a string value listed in the file * Allow to query REF,ALT columns directly, for example -e 'REF="N"' Changes affecting specific commands: * bcftools annotate - Fix `bcftools annotate --mark-sites`, VCF sites overlapping regions in a BED file were not annotated (#1989) - Add flexibility to FILTER column transfers and allow transfers within the same file, across files, and in combination. For examples see http://samtools.github.io/bcftools/howtos/annotate.html#transfer_filter_to_info * bcftools call - Output MIN_DP rather than MinDP in gVCF mode - New `-*, --keep-unseen-allele` option to output the unobserved allele <*>, intended for gVCF. * bcftools head - New `-s, --samples` option to include the #CHROM header line with samples. * bcftools gtcheck - Add output options `-o, --output` and `-O, --output-type` - Add filtering options `-i, --include` and `-e, --exclude` - Rename the short option `-e, --error-probability` from lower case to upper case `-E, --error-probability` - Changes to the output format, replace the DC section with DCv2: - adds a new column for the number of matching genotypes - The --error-probability is newly interpreted as the probability of erroneous allele rather than genotype. In other words, the calculation of the discordance score now considers the probability of genotyping error to be different for HOM and HET genotypes, i.e. P(0/1|dsg=0) > P(1/1|dsg=0). - fixes in HWE score calculation plus output average HWE score rather than absolute HWE score - better description of fields * bcftools merge - Add `-m` modifiers to suppress the output of the unseen allele <*> or <NON_REF> at variant sites (e.g. `-m both,*`) or all sites (e.g. `-m both,**`) * bcftools mpileup - Output MIN_DP rather than MinDP in gVCF mode * bcftools norm - Add the number of joined lines to the summary output, for example Lines total/split/joined/realigned/skipped: 6/0/3/0/0 - Allow combining -m and -a with --old-rec-tag (#2020) - Symbolic <DEL> alleles caused norm to expand REF to the full length of the deletion. This was not intended and problematic for long deletions, the REF allele should list one base only (#2029) * bcftools query - Add new `-N, --disable-automatic-newline` option for pre-1.18 query formatting behavior when newline would not be added when missing - Make the automatic addition of the newline character in a more predictable way and, when missing, always put it at the end of the expression. In version 1.18 it could be added at the end of the expression (for per-site expressions) or inside the square brackets (for per-sample expressions). The new behavior is: - if the formatting expression contains a newline character, do nothing - if there is no newline character and -N, --disable-automatic-newline is given, do nothing - if there is no newline character and -N is not given, insert newline at the end of the expression See #1969 for details - Add new `-F, --print-filtered` option to output a default string for samples that would otherwise be filtered by `-i/-e` expressions. - Include sample name in the output header with `-H` whenever it makes sense (#1992) * bcftools +spit-vep - Fix on the fly filtering involving numeric subfields, e.g. `-i 'MAX_AF<0.001'` (#2039) - Interpret default column type names (--columns-types) as entire strings, rather than substrings to avoid unexpected spurious matches (i.e. internally add ^ and $ to all field names) * bcftools +trio-dnm2 - Do not flag paternal genotyping errors as de novo mutations. Specifically, when father's chrX genotype is 0/1 and mother's 0/0, 0/1 in the child will not be marked as DNM. * bcftools view - Add new `-A, --trim-unseen-allele` option to remove the unseen allele <*> or <NON_REF> at variant sites (`-A`) or all sites (`-AA`) -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA. |
From: Robert D. <rm...@sa...> - 2023-07-25 13:14:22
|
Samtools (and HTSlib and BCFtools) version 1.18 is now available from GitHub and SourceForge. https://github.com/samtools/htslib/releases/tag/1.18 https://github.com/samtools/samtools/releases/tag/1.18 https://github.com/samtools/bcftools/releases/tag/1.18 https://sourceforge.net/projects/samtools/ The main changes are listed below: ------------------------------------------------------------------------------ htslib - changes v1.18 ------------------------------------------------------------------------------ Updates ------- * Using CRAM 3.1 no longer gives a warning about the specification being draft. Note CRAM 3.0 is still the default output format. (PR#1583) * Replaced use of sprintf with snprintf, to silence potential warnings from Apple's compilers and those who implement similar checks. (PR#1594, fixes #1586. Reported by Oleksii Nikolaienko) * Fastq output will now generate empty records for reads with no sequence data (i.e. sequence is "*" in SAM format). (PR#1576, fixes samtools/samtools#1576. Reported by Nils Homer) * CRAM decoding speed-ups. (PR#1580) * A new MN aux tag can now be used to verify that MM/ML base modification data has not been broken by hard clipping. (PR#1590, PR#1612. See also PR samtools/hts-specs#714 and issue samtools/hts-specs#646. Reported by Jared Simpson) * The base modification API has been improved to make it easier for callers to tell unchecked bases from unmodified ones. (PR#1636, fixes #1550. Requested by Chris Wright) * A new bam_mods_queryi() API has been added to return additional data about the i-th base modification returned by bam_mods_recorded(). (PR#1636, fixes #1550 and #1635. Requested by Jared Simpson) * Speed up index look-ups for whole-chromosome queries. (PR#1596) * Mpileup now merges adjacent (mis)match CIGAR operations, so CIGARs using the X/= operators give the same results as if the M operator was used. (PR#1607, fixes #1597. Reported by Marcel Martin) * It's now possible to call bcf_sr_set_regions() after adding readers using bcf_sr_add_reader() (previously this returned an error). Doing so will discard any unread data, and reset the readers so they iterate over the new regions. (PR#1624, fixes samtools/bcftools#1918. Reported by Gregg Thomas) * The synced BCF reader can now accept regions with reference names including colons and hyphens, by enclosing them in curly braces. For example, {chr_part:1-1001}:10-20 will return bases 10 to 20 from reference "chr_part:1-1001". (PR#1630, fixes #1620. Reported by Bren) * Add a "samples" directory with code demonstrating usage of HTSlib plus a tutorial document. (PR#1589) Build changes ------------- * Htscodecs has been updated to 1.5.1 (PR#1654) * Htscodecs SIMD code now works with Apple multiarch binaries. (PR#1587, HTSlib fix for samtools/htscodecs#76. Reported by John Marshall) * Improve portability of "expr" usage in version.sh. (PR#1593, fixes #1592. Reported by John Marshall) * Improve portability to *BSD targets by ensuring _XOPEN_SOURCE is defined correctly and that source files properly include "config.h". Perl scripts also now all use #!/usr/bin/env instead of assuming that it's in /usr/bin/perl. (PR#1628, fixes #1606. Reported by Robert Clausecker) * Fixed NAME entry in htslib-s3-plugin man page so the whatis and apropos commands find it. (PR#1634, thanks to Étienne Mollier) * Assorted dependency tracking fixes. (PR#1653, thanks to John Marshall) Documentation updates --------------------- * Changed Alpine build instructions as they've switched back to using openssl. (PR#1609) * Recommend using -rdynamic when statically linking a libhts.a with plugins enabled. (PR#1611, thanks to John Marshall. Fixes #1600, reported by Jack Wimberley) * Fixed example in docs for sam_hdr_add_line(). (PR#1618, thanks to kojix2) * Improved test harness for base modifications API. (PR#1648) Bug fixes --------- * Fix a major bug when searching against a CRAM index where one container has start and end coordinates entirely contained within the previous container. This would occasionally miss data, and sometimes return much more than required. The bug affected versions 1.11 to 1.17, although the change in 1.11 was bug-fixing multi-threaded index queries. This bug did not affect index building. There is no need to reindex your CRAM files. (PR#1574, PR#1640. Fixes #1569, #1639, samtools/samtools#1808, samtools/samtools#1819. Reported by xuxif, Jens Reeder and Jared Simpson) * Prevent CRAM blocks from becoming too big in files with short sequences but very long aux tags. (PR #1613) * Fix bug where the CRAM decoder for CONST_INT and CONST_BYTE codecs may incorrectly look for extra data in the CORE block. Note that this bug only affected the experimental CRAM v4.0 decoder. (PR#1614) * Fix crypt4gh redirection so it works in conjunction with non-file IO, such as using htsget. (PR#1577) * Improve error checking for the VCF POS column, when facing invalid data. (PR#1575, replaces #1570 originally reported and fixed by Colin Nolan.) * Improved error checking on VCF indexing to validate the data is BGZF compressed. (PR#1581) * Fix bug where bin number calculation could overflow when making iterators over regions that go to the end of a chromosome. (PR#1595) * Backport attractivechaos/klib#78 (by Pall Melsted) to HTSlib. Prevents infinite loops in kseq_read() when reading broken gzip files. (PR#1582, fixes #1579. Reported by Goran Vinterhalter) * Backport attractivechaos/klib@384277a (by innoink) to HTSlib. Fixes the kh_int_hash_func2() macro definition. (PR#1599, fixes #1598. Reported by fanxinping) * Remove a compilation warning on systems with newer libcurl releases. (PR#1572) * Windows: Fixed BGZF EOF check for recent MinGW releases. (PR#1601, fixes samtools/bcftools#1901) * Fixed bug where tabix would not return the correct regions for files where the column ordering is end, ..., begin instead of begin, ..., end. (PR#1626, fixes #1622. Reported by Hiruna Samarakoon) * sam_format_aux1() now always NUL-terminates Z/H tags. (PR#1631) * Ensure base modification iterator is reset when no MM tag is present. (PR#1631, PR#1647) * Fix segfault when attempting to write an uncompressed BAM file opened using hts_open(name, "wbu"). This was attempting to write BAM data without wrapping it in BGZF blocks, which is invalid according to the BAM specification. "wbu" is now internally converted to "wb0" to output uncompressed data wrapped in BGZF blocks. (PR#1632, fixes #1617. Reported by Joyjit Daw) * Fixed over-strict bounds check in probaln_glocal() which caused it to make sub-optimal alignments when the requested band width was greater than the query length. (PR#1616, fixes #1605. Reported by Jared Simpson) * Fixed possible double frees when handling errors in bcf_hdr_add_hrec(), if particular memory allocations fail. (PR#1637) * Ensure that bcf_hdr_remove() clears up all pointers to the items removed from dictionaries. Failing to do this could have resulted in a call requesting a deleted item via bcf_hdr_get_hrec() returning a stale pointer. (PR#1637) * Stop the gzip decompresser from finishing prematurely when an empty gzip block is followed by more data. (PR#1643, PR#1646) ------------------------------------------------------------------------------ samtools - changes v1.18 ------------------------------------------------------------------------------ New work and changes: * Add minimiser sort option to collate by an indexed fasta. Expand the minimiser sort to arrange the minimiser values in the same order as they occur in the reference genome. This is acts as an extremely crude and simplistic read aligner that can be used to boost read compression. (PR#1818) * Add a --duplicate-count option to markdup. Adds the number of duplicates (including itself) to the original read in a 'dc' tag. (PR#1816. Thanks to wulj2) * Make calmd handle unaligned data or empty files without throwing an error. This is to make pipelines work more smoothly. A warning will still be issued. (PR#1841, fixes #1839. Reported by Filipe G. Vieira) * Consistent, more comprehensive flag filtering for fasta/fastq. Added --rf/--incl[ude]-flags and long options for -F (--excl[ude]-flags and -f (--require-flags). (PR#1842. Thanks to Devang Thakkar) * Apply fastq --input-fmt-option settings. Previously any options specified were not being applied to the input file. (PR#1855. Thanks to John Marshall) * Add fastq -d TAG[:VAL] check. This mirrors view -d and will only output alignments that match TAG (and VAL if specified). (PR#1863, fixes #1854. Requested by Rasmus Kirkegaard) * Extend import --order TAG to --order TAG:length. If length is specified, the tag format goes from integer to a 0-padded string format. This is a workaround for BAM and CRAM that cannot encode an order tag of over 4 billion records. (PR#1850, fixes #1847. Reported by Feng Tian) * New -aa mode for consensus. This works like the -aa option in depth and mpileup. The single 'a' reports all bases in contigs covered by alignments. Double 'aa' (or '-a -a') reports Ns even for the references with no alignments against them. (PR#1851, fixes #1849. Requested by Tim Fennell) * Add long option support to samtools index. (PR#1872, fixes #1869. Reported by Jason Bacon) * Be consistent with rounding of "average length" in samtools stats. (PR#1876, fixes #1867. Reported by Jelinek-J) * Add option to ampliconclip that marks reads as unmapped when they do not have enough aligned bases left after clipping. Default is to unmap reads with zero aligned bases. (PR#1865, fixes #1856. Requested by ces) Bug Fixes: * [From HTSLib] Fix a major bug when searching against a CRAM index where one container has start and end coordinates entirely contained within the previous container. This would occasionally miss data, and sometimes return much more than required. The bug affected versions 1.11 to 1.17, although the change in 1.11 was bug-fixing multi-threaded index queries. This bug did not affect index building. There is no need to reindex your CRAM files. (PR#samtools/htslib#1574, PR#samtools/htslib#1640. Fixes #samtools/htslib#1569, #samtools/htslib#1639, #1808, #1819. Reported by xuxif, Jens Reeder and Jared Simpson) * Fix a sort -M bug (regression) when merging sub-blocks. Data was valid but in a poor order for compression. (PR#1812) * Fix bug in split output format. Now SAM and CRAM format can chosen as well as BAM. Also a documentation change, see below. (PR#1821) * Add error checking to view -e filter expression code. Invalid expressions were not returning an error code. (PR#1833, fixes #1829. Reported by Steve Huang) * Fix reheader CRAM output version. Sets the correct CRAM output version for non-3.0 CRAMs. (PR#1868, fixes #1866. Reported by John Marshall) Documentation: * Expand the default filtering information on the mpileup man page. (PR#1802, fixes #1801. Reported by gevro) * Add an explanation of the default behaviour of split files on generating a file for reads with missing or unrecognised RG tags. Also a small bug fix, see above. (PR#1821, fixes #1817. Reported by Steve Huang) * In the INSTALL instructions, switched back to openssl for Alpine. This matches the current Alpine Linux practice. (PR#1837, see htslib#1591. Reported by John Marshall) * Fix various typos caught by lintian parsers. (PR#1877. Thanks to Étienne Mollier) * Document consensus --qual-calibration option. (PR#1880, fixes #1879. Reported by John Marshall) * Updated the page about samtools duplicate marking with more detail at www.htslib.org/algorithms/duplicate.html Non user-visible changes and build improvements: * Removed a redundant line that caused a warning in gcc-13. (PR#1838) ------------------------------------------------------------------------------ bcftools - changes v1.18 ------------------------------------------------------------------------------ Changes affecting the whole of bcftools, or multiple commands: * Support auto indexing during writing BCF and VCF.gz via new `--write-index` option Changes affecting specific commands: * bcftools annotate - The `-m, --mark-sites` option can be now used to mark all sites without the need to provide the `-a` file (#1861) - Fix a bug where the `-m` function did not respect the `--min-overlap` option (#1869) - Fix a bug when update of INFO/END results in assertion error (#1957) * bcftools concat - New option `--drop-genotypes` * bcftools consensus - Support higher-ploidy genotypes with `-H, --haplotype` (#1892) - Allow `--mark-ins` and `--mark-snv` with a character, similarly to `--mark-del` * bcftools convert - Support for conversion from tab-delimited files (CHROM,POS,REF,ALT) to sites-only VCFs * bcftools csq - New `--unify-chr-names` option to automatically unify different chromosome naming conventions in the input GFF, fasta and VCF files (e.g. "chrX" vs "X") - More versatility in parsing various flavors of GFF - A new `--dump-gff` option to help with debugging and investigating the internals of hGFF parsing - When printing consequences in nonsense mediated decay transcripts, include 'NMD_transcript' in the consequence part of the annotation. This is to make filtering easier and analogous to VEP annotations. For example the consequence annotation 3_prime_utr|PCGF3|ENST00000430644|NMD is newly printed as 3_prime_utr&NMD_transcript|PCGF3|ENST00000430644|NMD * bcftools gtcheck - Add stats for the number of sites matched in the GT-vs-GT, GT-vs-PL, etc modes. This information is important for interpretation of the discordance score, as only the GT-vs-GT matching can be interpreted as the number of mismatching genotypes. * bcftools +mendelian2 - Fix in command line argument parsing, the `-p` and `-P` options were not functioning (#1906) * bcftools merge - New `-M, --missing-rules` option to control the behavior of merging of vector tags to prevent mixtures of known and missing values in tags when desired - Use values pertaining to the unknown allele (<*> or <NON_REF>) when available to prevent mixtures of known and missing values (#1888) - Revamped line matching code to fix problems in gVCF merging where split gVCF blocks would not update genotypes (#1891, #1164). * bcftool mpileup - Fix a bug in --indels-v2.0 which caused an endless loop when CIGAR operator 'H' or 'P' was encountered * bcftools norm - The `-m, --multiallelics +` mode now preserves phasing (#1893) - Symbolic <DEL.*> alleles are now normalized too (#1919) - New `-g, --gff-annot` option to right-align indels in forward transcripts to follow HGVS 3'rule (#1929) * bcftools query - Force newline character in formatting expression when not given explicitly - Fix `-H` header output in formatting expressions containing newlines * bcftools reheader - Make `-f, --fai` aware of long contigs not representable by 32-bit integer (#1959) * bcftools +split-vep - Prevent a segfault when `-i/-e` use a VEP subfield not included in `-f` or `-c` (#1877) - New `-X, --keep-sites` option complementing the existing `-x, --drop-sites` options - Force newline character in formatting expression when not given explicitly - Fix a subtle ambiguity: identical rows must be returned when `-s` is applied regardless of `-f` containing the `-a` VEP tag itself or not. * bcftools stats - Collect new VAF (variant allele frequency) statistics from FORMAT/AD field - When counting transitions/transversions, consider also alternate het genotypes * plot-vcfstats - Add three new VAF plots -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA. |
From: Robert D. <rm...@sa...> - 2023-02-21 14:39:24
|
Samtools (and HTSlib and BCFtools) version 1.17 is now available from GitHub and SourceForge. https://github.com/samtools/htslib/releases/tag/1.17 https://github.com/samtools/samtools/releases/tag/1.17 https://github.com/samtools/bcftools/releases/tag/1.17 https://sourceforge.net/projects/samtools/ The main changes are listed below: ------------------------------------------------------------------------------ htslib - changes v1.17 ------------------------------------------------------------------------------ * A new API for iterating through a BAM record's aux field. (PR#1354, addresses #1319. Thanks to John Marshall) * Text mode for bgzip. Allows bgzip to compress lines of text with block breaks at newlines. (PR#1493, thanks to Mike Lin for the initial version PR#1369) * Make tabix support CSI indices with large positions. Unlike SAM and VCF files, BED files do not set a maximum reference length which hindered CSI support. This change sets an arbitrary large size of 100G to enable it to work. (PR#1506) * Add a fai_line_length function. Exposes the internal line-wrap length. (PR#1516) * Check for invalid barcode tags in fastq output. (PR#1518, fixes samtools#1728. Reported by Poshi) * Warn if reference found in a CRAM file is not contained in the specified reference file. (PR#1517 and PR#1521, adds diagnostics for #1515. Reported by Wei WeiDeng) * Add a faidx_seq_len64 function that can return sequence lengths longer than INT_MAX. At the same time limit faidx_seq_len to INT_MAX output. Also add a fai_adjust_region to ensure given ranges do not go beyond the end of the requested sequence. (PR#1519) * Add a bcf_strerror function to give text descriptions of BCF errors. (PR#1510) * Add CRAM SQ/M5 header checking when specifying a fasta file. This is to prevent creating a CRAM that cannot be decoded again. (PR#1522. In response to samtools#1748 though not a direct fix) * Improve support for very long input lines (> 2Gbyte). This is mostly useful for tabix which does not do much interpretation of its input. (PR#1542, a partial fix for #1539) * Speed up load_ref_portion. This function has been sped up by about 7x, which speeds up low-depth CRAM decoding by about 10%. (PR#1551) * Expand CRAM API to cope with new samtools cram_size command. (PR#1546) * Merges neighbouring I and D ops into one op within pileup. This means 4M1D1D1D3M is reported as 4M3D3M. Fixing this in sam.c means not only is samtools mpileup now looking better, but any tool using the mpileup API will be getting consistent results. (PR#1552, fixes the last remaining part of samtools#139) * Update the API documentation for bgzf_mt as it refered to a previous iteration. (PR#1556, fixes #1553. Reported by Raghavendra Padmanabhan) Build changes ------------- * Use POSIX grep in testing as egrep and fgrep are considered obsolete. (PR#1509, thanks to David Seifert) * Switch to building libdefalte with cmake for Cirris CI. (PR#1511) * Ensure strings in config_vars.h are escaped correctly. (PR#1530, fixes #1527. Reported by Lucas Czech) * Easier modification of shared library permissions during install. (PR#1532, fixes #1525. Reported by StephDC) * Fix build on ancient compilers. Added -std=gnu90 to build tests so older C compilers will still be happy. (PR#1524, fixes #1523. Reported by Martin Jakt) * Switch MacOS CI tests to an ARM-based image. (PR#1536) * Cut down the number of embed_ref=2 tests that get run. (PR#1537) * Add symbol versions to libhts.so. This is to aid package developers. (PR#1560 addresses #1505, thanks to John Marshall. Reported by Stefan Bruens) * htscodecs now updated to v1.4.0. (PR#1563) * Cleaned up misleading system error reports in test_bgzf. (PR#1565) Bug fixes --------- * VCF. Fix n-squared complexity in sample line with many adjacent tabs [fuzz]. (PR#1503) * Improved bcftools detection and reporting of bgzf decode errors. (PR#1504, thanks to Lilian Janin. PR#1529 thanks to Bergur Ragnarsson, fixes #1528. PR#1554) * Prevent crash when the only FASTA entry has no sequence [fuzz]. (PR#1507) * Fixed typo in sam.h documentation. (PR#1512, thanks to kojix2) * Fix buffer read-overrun in bam_plp_insertion_mod. (PR#1520) * Fix hash keys being left behind by bcf_hdr_remove. (PR#1535, fixes #1533. Reported by Giulio Genovese in #842) * Make bcf_hdr_idinfo_exists more robust by checking id value exists. (PR#1544, fixes #1538. Reported by Giulio Genovese) * CRAM improvements. Fixed crash with multi-threaded CRAM. Fixed a bug in the codec parameter learning for CRAM 3.1 name tokeniser. Fixed Cram compression container substitution matrix generation, (PR#1558, PR#1559 and PR#1562) ------------------------------------------------------------------------------ samtools - changes v1.17 ------------------------------------------------------------------------------ New work and changes: * New samtools reset subcommand. Removes alignment information. Alignment location, CIGAR, mate mapping and flags are updated. If the alignment was in reverse direction, sequence and its quality values are reversed and complemented and the reverse flag is reset. Supplementary and secondary alignment data are discarded. (PR#1767, implements #1682. Requested by dkj) * New samtools cram-size subcommand. It writes out metrics about a CRAM file reporting aggregate sizes per block "Content ID" fields, the data-series contained within them, and the compression methods used. (PR#1777) * Added a --sanitize option to fixmate and view. This performs some sanity checks on the state of SAM record fields, fixing up common mistakes made by aligners. (PR#1698) * Permit 1 thread with samtools view. All other subcommands already allow this and it does provide a modest speed increase. (PR#1755, fixes #1743. Reported by Goran Vinterhalter) * Add CRAM_OPT_REQUIRED_FIELDS option for view -c. This is a big speed up for CRAM (maybe 5-fold), but it depends on which filtering options are being used. (PR#1776, fixes #1775. Reported by Chang Y) * New filtering options in samtools depth. The new --excl-flags option is a synonym for -G, with --incl-flags and --require-flags added to match view logic. (PR#1718, fixes #1702. Reported by Dario Beraldi) * Speed up calmd's slow handling of non-position-sorted data by adding caching. This uses more memory but is only activated when needed. (PR#1723, fixes #1595. Reported by lxwgcool) * Improve samtools consensus for platforms with instrument specific profiles, considerably helping for data with very different indel error models and providing base quality recalibration tables. On PacBio HiFi, ONT and Ultima Genomics consensus qualities are also redistributed within homopolymers and the likelihood of nearby indel errors is raised. (PR#1721, PR#1733) * Consensus --mark-ins option. This permits he consensus output to include a markup indicating the next base is an insertion. This is necessary as we need a way of outputting both consensus and also how that consensus marries up with the reference coordinates. (PR#1746) * Make faidx/fqidx output line length default to the input line length. (PR#1738, fixes #1734. Reported by John Marshall) * Speed up optical duplicate checking where data has a lot of duplicates compared to non-duplicates. (PR#1779, fixes #1771. Reported by Poshi) * For collate use TMPDIR environment variable, when looking for a temporary folder. (PR#1782, based on PR#1178 and fixes #1172. Reported by Martin Pollard) Bug Fixes: * Fix stats breakage on long deletions when given a reference. (PR#1712, fixes #1707. Reported by John Didion) * In ampliconclip, stop hard clipping from wrongly removing entire reads. (PR#1722, fixes #1717. Reported by Kevin Xu) * Fix bug in ampliconstats where references mentioned in the input file headers but not in the bed file would cause it to complain that the SAM headers were inconsistent. (PR#1727, fixes #1650. Reported by jPontix) * Fixed SEGV in samtools collate when no filename given. (PR#1724) * Changed the default UMI barcode regex in markdup. The old regex was too restrictive. This version will at least allow the default read name UMI as given in the Illumina example documentation. (PR#1737, fixes #1730. Reported by yloemie) * Fix samtools consensus buffer overrun with MD:Z handling. (PR#1745, fixes #1744. Reported by trilisser) * Fix a buffer read-overflow in mpileup and tview on sequences with seq "*". (PR#1747) * Fix view -X command line parsing that was broken in 1.15. (PR#1772, fixes #1720. Reported by Francisco Rodríguez-Algarra and Miguel Machado) * Stop samtools view -d from reporting meaningless system errors when tag validation fails. (PR#1796) Documentation: * Add a description of the samtools tview display layout to the man page. Documents . vs , and upper vs lowercase. Adds a -s sample example, and documents the -w option. (PR#1765, fixes #1759. Reported by Lucas Ferreira da Silva) * Clarify intention of samtools fasta/q in man page and soft vs hard clipping. (PR#1794, fixes #1792. Reported by Ryan Lorig-Roach) * Minor fix to wording of mpileup --rf usage and man page. (PR#1795, fixes #1791. Reported by Luka Pavageau) Non user-visible changes and build improvements: * Use POSIX grep in testing as egrep and fgrep are considered obsolete. (PR#1726, thanks to David Seifert) * Switch MacOS CI tests to an ARM-based image. (PR#1770) ------------------------------------------------------------------------------ bcftools - changes v1.17 ------------------------------------------------------------------------------ Changes affecting the whole of bcftools, or multiple commands: * The -i/-e filtering expressions - Error checks were added to prevent incorrect use of vector arithmetics. For example, when evaluating the sum of two vectors A and B, the resulting vector could contain nonsense values when the input vectors were not of the same length. The fix introduces the following logic: - evaluate to C_i = A_i + B_i when length(A)==B(A) and set length(C)=length(A) - evaluate to C_i = A_i + B_0 when length(B)=1 and set length(C)=length(A) - evaluate to C_i = A_0 + B_i when length(A)=1 and set length(C)=length(B) - throw an error when length(A)!=length(B) AND length(A)!=1 AND length(B)!=1 - Arrays in Number=R tags can be now subscripted by alleles found in FORMAT/GT. For example, FORMAT/AD[GT] > 10 .. require support of more than 10 reads for each allele FORMAT/AD[0:GT] > 10 .. same as above, but in the first sample sSUM(FORMAT/AD[GT]) > 20 .. require total sample depth bigger than 20 * The commands `consensus -H` and `+split-vep -H` - Drop unnecessary leading space in the first header column and newly print `#[1]columnName` instead of the previous `# [1]columnName` (#1856) Changes affecting specific commands: * bcftools +allele-length - Fix overflow for indels longer than 512bp and aggregate alleles equal or larger than that in the same bin (#1837) * bcftools annotate - Support sample reordering of annotation file (#1785) - Restore lost functionality of the --pair-logic option (#1808) * bcftools call - Fix a bug where too many alleles passed to `-C alleles` via `-T` caused memory corruption (#1790) - Fix a bug where indels constrained with `-C alleles -T` would sometimes be missed (#1706) * bcftools consensus - BREAKING CHANGE: the option `-I, --iupac-codes` newly outputs IUPAC codes based on FORMAT/GT of all samples. The `-s, --samples` and `-S, --samples-file` options can be used to subset samples. In order to ignore samples and consider only the REF and ALT columns (the original behavior prior to 1.17), run with `-s -` (#1828) * bcftools convert - Make variantkey conversion work for sites without an ALT allele (#1806) * bcftool csq - Fix a bug where a MNV with multiple consequences (e.g. missense + stop_gained) would report only the less severe one (#1810) - GFF file parsing was made slightly more flexible, newly ids can be just 'XXX' rather than, for example, 'gene:XXX' - New gff2gff perl script to fix GFF formatting differences * bcftools +fill-tags - More of the available annotations are now added by the `-t all` option * bcftools +fixref - New INFO/FIXREF annotation - New -m swap mode * bcftools +mendelian - The +mendelian plugin has been deprecated and replaced with +mendelian2. The function of the plugin is the same but the command line options and the output format has changed, and for this was introduced as a new plugin. * bcftools mpileup - Most of the annotations generated by mpileup are now optional via the `-a, --annotate` option and add several new (mostly experimental) annotations. - New option `--indels-2.0` for an EXPERIMENTAL indel calling model. This model aims to address some known deficiencies of the current indel calling algorithm, specifically, it uses diploid reference consensus sequence. Note that in the current version it has the potential to increase sensitivity but at the cost of decreased specificity. - Make the FS annotation (Fisher exact test strand bias) functional and remove it from the default annotations * bcftools norm - New --multi-overlaps option allows to set overlapping alleles either to the ref allele (the current default) or to a missing allele (#1764 and #1802) - Fixed a bug in `-m -` which does not split missing FORMAT values correctly and could lead to empty FORMAT fields such as `::` instead of the correct `:.:` (#1818) - The `--atomize` option previously would not split complex indels such as C>GGG. Newly these will be split into two records C>G and C>CGG (#1832) * bcftools query - Fix a rare bug where the printing of SAMPLE field with `query` was incorrectly suppressed when the `-e` option contained a sample expression while the formatting query did not. See #1783 for details. * bcftools +setGT - Add new `--new-gt X` option (#1800) - Add new `--target-gt r:FLOAT` option to randomly select a proportion of genotypes (#1850) - Fix a bug where `-t ./x` mode was advertised as selecting both phased and unphased half-missing genotypes, but was in fact selecting only unphased genotypes (#1844) * bcftools +split-vep - New options `-g, --gene-list` and `--gene-list-fields` which allow to prioritize consequences from a list of genes, or restrict output to the listed genes - New `-H, --print-header` option to print the header with `-f` - Work around a bug in the LOFTEE VEP plugin used to annotate gnomAD VCFs. There the LoF_info subfield contains commas which, in general, makes it impossible to parse the VEP subfields. The +split-vep plugin can now work with such files, replacing the offending commas with slash (/) characters. See also https://github.com/Ensembl/ensembl-vep/issues/1351 - Newly the `-c, --columns` option can be omitted when a subfield is used in `-i/-e` filtering expression. Note that `-c` may still have to be given when it is not possible to infer the type of the subfield. Note that this is an experimental feature. * bcftools stats - The per-sample stats (PSC) would not be computed when `-i/-e` filtering options and the `-s -` option were given but the expression did not include sample columns (1835) * bcftools +tag2tag - Revamp of the plugin to allow wider range of tag conversions, specifically all combinations from FORMAT/GL,PL,GP to FORMAT/GL,PL,GP,GT * bcftools +trio-dnm2 - New `-n, --strictly-novel` option to downplay alleles which violate Mendelian inheritance but are not novel - Allow to set the `--pn` and `--pns` options separately for SNVs and indels and make the indel settings more strict by default - Output missing FORMAT/VAF values in non-trio samples, rather than random nonsense values * bcftools +variant-distance - New option `-d, --direction` to choose the directionality: forward, reverse, nearest (the default) or both (#1829) -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Robert D. <rm...@sa...> - 2022-09-02 14:20:16
|
Samtools version 1.16.1 is now available from GitHub and SourceForge. This release fixes some bugs in the new template-coordinate sort feature. https://sourceforge.net/projects/samtools/ https://github.com/samtools/samtools/releases/tag/1.16.1 The main changes are listed below: ------------------------------------------------------------------------------ samtools - changes v1.16.1 ------------------------------------------------------------------------------ Bug fixes: * Fixed a bug with the template-coordinate sort which caused incorrect ordering when using threads, or processing large files that don't fit completely in memory. (PR#1703, thanks to Nils Homer) * Fixed a crash that occurred when trying to use `samtools merge` in template-coordinate mode. (PR#1705, thanks to Nils Homer) -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Robert D. <rm...@sa...> - 2022-08-18 14:17:45
|
Samtools (and HTSlib and BCFtools) version 1.16 is now available from GitHub and SourceForge. https://sourceforge.net/projects/samtools/ https://github.com/samtools/htslib/releases/tag/1.16 https://github.com/samtools/samtools/releases/tag/1.16 https://github.com/samtools/bcftools/releases/tag/1.16 The main changes are listed below: ------------------------------------------------------------------------------ htslib - changes v1.16 ------------------------------------------------------------------------------ * Make hfile_s3 refresh AWS credentials on expiry in order to make HTSlib work better with AWS IAM credentials, which have a limited lifespan. (PR#1462 and PR#1474, addresses #344) * Allow BAM headers between 2GB and 4GB in size once more. This is not permitted in the BAM specification but was allowed in an earlier version of HTSlib. There is now a warning at 2GB and a hard failure at 4GB. (PR#1421, fixes #1420 and samtools#1613. Reported by John Marshall and R C Mueller) * Improve error message when failing to load an index. (PR#1468, example of the problem samtools#1637) * Permit MM (base modification) tags containing "." and "?" suffixes. These define implicit vs explicit coordinates. See the SAM tags specification for details. (PR#1423 and PR#1426, fixes #1418. PR#1469, fixes #1466. Reported by cjw85) * Warn if spaces instead of tabs are detected in a VCF file to prevent confusion. (PR#1328, fixes bcftools#1575. Reported by ketkijoshi278) * Add an "sclen" filter expression keyword. This is the length of a soft-clip, both left and right end. It may be combined with qlen (qlen-sclen) to obtain the number of bases in the query sequence that have been aligned to the genome ie it provides a way to compare local-alignment vs global-alignment length. (PR#1441 and PR/samtools#1661, fixes #1436. Requested by Chang Y) * Improve error messages for CRAM reference mismatches. If the user specifies the wrong reference, the CRAM slice header MD5sum checks fail. We now report the SQ line M5 string too so it is possible to validate against the whole chr in the ref.fa file. The error message has also been improved to report the reference name instead of #num. Finally, we now hint at the likely cause, which counters the misleading samtools supplied error of "truncated or corrupt" file. (PR#1427, fixes samtools#1640. Reported by Jian-Guo Zhou) * Expose more of the CRAM API and add new functionality to extract the reference from a CRAM file. (PR#1429 and PR#1442) * Improvements to the implementation of embedded references in CRAM where no external reference is specified. (PR#1449, addresses some of the issues in #1445) * The CRAM writer now allows alignment records with RG:Z: aux tags that don't have a corresponding @RG ID in the file header. Previously these tags would have been silently dropped. HTSlib will complain whenever it has to add one though, as such tags do not conform to recommended practice for the SAM, BAM and CRAM formats. (PR#1480, fixes #1479. Reported by Alex Leonard) * Set tab delimiter in man page for tabix GFF3 sort. (PR#1457. Thanks to Colin Diesh) * When using libdeflate, the 1...9 scale of BGZF compression levels is now remapped to the 1...12 range used by libdeflate instead of being passed directly. In particular, HTSlib levels 8 and 9 now map to libdeflate levels 10 and 12, so it is possible to select the highest (but slowest) compression offered by libdeflate. (PR#1488, fixes #1477. Reported by Gert Hulselmans) * The VCF variant API has been extended so that it can return separate flags for INS and DEL variants as well as the existing INDEL one. These flags have not been added to the old bcf_get_variant_types() interface as it could break existing users. To access them, it is necessary to use new functions bcf_has_variant_type() and bcf_has_variant_types(). (PR#1467) * The missing, but trivial, `le_to_u8()` function has been added to hts_endian. (PR#1494, Thanks to John Marshall) * bcf_format_gt() now works properly on big-endian platforms. (PR#1495, Thanks to John Marshall) Build changes ------------- These are compiler, configuration and makefile based changes. * Update htscodecs to version 1.3.0 for new SIMD code + various fixes. Updates the htscodecs submodule and adds changes necessary to make HTSlib build the new SIMD codec implementations. (PR#1438, PR#1489, PR#1500) * Fix clang builds under mingw. Under mingw, clang requires dllexport to be applied to both function declarations and function definitions. (PR#1435, PR#1497, PR#1498 fixes #1433. Reported by teepean) * Fix curl type warning with gcc 12.1 on Windows. (PR#1443) * Detect ARM Neon support and only build appropriate SIMD object files. (PR#1451, fixes #1450. Thanks to John Marshall) * `make print-config` now reports extra CFLAGS that are needed to build the SIMD parts of htscodecs. These may be of use to third-party build systems that don't use HTSlib's or htscodecs' build infrastructure. (PR#1485. Thanks to John Marshall) * Fixed some Makefile dependency issues for the "check"/"test" targets and plugins. In particular, "make check" will now build the "all" target, if not done already, before running the tests. (PR#1496) Bug fixes --------- * Fix bug when reading position -1 in BCF (0 in VCF), which is used to indicate telomeric regions. The BCF reader was incorrectly assuming the value stored in the file was unsigned, so a VCF->BCF->VCF round-trip would change it from 0 to 4294967296. (PR#1476, fixes #1475 and bcftools#1753. Reported by Rodrigo Martin) * Various bugs and quirks have been fixed in the filter expression engine, mostly related to the handling of absent tags, and the is_true flag. Note that as a result of these fixes, some filter expressions may give different results: - Fixed and-expressions including aux tag values which could give an invalid true result depending on the order of terms. - The expression `![NM]` is now true if only `NM` does not exist. In earlier versions it would also report true for tags like `NM:i:0` which exist but have a value of zero. - The expression `[X1] != 0` is now false when `X1` does not exist. Earlier versions would return true for this comparison when the tag was missing. - NULL values due to missing tags now propagate through string, bitwise and mathematical operations. Logical operations always treat them as false. (PR#1463, fixes samtools#1670. Reported by Gert Hulselmans; PR#1478, fixes samtools#1677. Reported by johnsonzcode) * Fix buffer overrun in bam_plp_insertion_mod. Memory now grows to the proper size needed for base modification data. (PR#1430, fixes samtools#1652. Reported by hd2326) * Remove limit of returned size from fai_retrieve(). (PR#1446, fixes samtools#1660. Reported by Shane McCarthy) * Cap hts_getline() return value at INT_MAX. Prevents hts_getline() from returning a negative number (a fail) for very long string length values. (PR#1448. Thanks to John Marshall) * Fix breakend detection and test bcf_set_variant_type(). (PR#1456, fixes #1455. Thanks to Martin Pollard) * Prevent arrays of BCF_BT_NULL values found in BCF files from causing bcf_fmt_array() to call exit() as the type is unsupported. These are now tested for and caught by bcf_record_check(), which returns an error code instead. (PR#1486) * Improved detection of fasta and fastq files that have very long comments following identifiers. (PR#1491, thanks to John Marshall. Fixes samtools/samtools#1689, reported by cjw85) * Fixed a SEGV triggered by giving a SAM file to `samtools import`. (PR#1492) ------------------------------------------------------------------------------ samtools - changes v1.16 ------------------------------------------------------------------------------ New work and changes: * samtools reference command added. This subcommand extracts the embedded reference out of a CRAM file. (PR#1649, addresses #723. Requested by Torsten Seemann) * samtools import now adds grouped by query-name to the header. (PR#1633, thanks to Nils Homer) * Made samtools view read error messages more generic. Former error message would claim that there was a "truncated file or corrupt BAM index file" with no real justification. Also reset errno in stream_view which could lead to confusing error messages. (PR#1645, addresses some of the issues in #1640. Reported by Jian-Guo Zhou) * Make samtools view -p also clear mqual, tlen and cigar. (PR#1647, fixes #1606. Reported by eboyden) * Add bedcov option -c to report read count. (PR#1644, fixes #1629. Reported by Natchaphon Rajudom) * Add UMI/barcode handling to samtools markdup. (PR#1630, fixes #1358 and #1514. Reported by Gert Hulselmans and Poshi) * Add a new template coordinate sort order to samtools sort and samtools merge. This is useful when working with unique molecular identifiers (UMIs). (PR#1605, fixes #1591. Thanks to Nils Homer) * Rename mpileup --ignore-overlaps to --ignore-overlaps-removal or --disable-overlap-removal. The previous name was ambiguous and was often read as an option to enable removal of overlapping bases, while in reality this is on by default and the option turns off the ability to remove overlapping bases. (PR#1666, fixes #1663. Reported by yangdingyangding) * The dict command can now read BWA's .alt file and add AH:* tags indicating reference sequences that represent alternate loci. (PR#1676. Thanks to John Marshall) * The "samtools index" command can now accept multiple alignment filenames with the new -M option, and will index each of them separately. (Specifying the output index filename via out.index or the new -o option is currently only applicable when there is only one alignment file to be indexed.) (PR#1674. Reported by Abigail Ramsøe and Nicola Romanò. Thanks to John Marshall) * Allow samtools fastq -T "*". This allows all tags from SAM records to be written to fastq headers. This is a counterpart to samtools import -T "*". (PR#1679. Thanks to cjw85) Bug Fixes: * Re-enable --reference option for samtools depth. The reference is not used but this makes the command line usage compatible with older releases. (PR#1646, fixes #1643. Reported by Randy Harr) * Fix regex coordinate bug in samtools markdup. (PR#1657, fixes #1642. Reported by Randy Harr) * Fix divide by zero in plot-bamstats -m, on unmapped data. (PR#1678, fixes #1675. Thanks to Shane McCarthy) * Fix missing RG headers when using samtools merge -r. (PR#1683, addresses htslib#1479. Reported by Alex Leonard) * Fix a possible unaligned access in samtools reference. (PR#1696) Documentation: * Add documentation on CRAM compression profiles and some of the newer options that appear in CRAM 3.1 and above. (PR#1659, fixes #1656. Reported by Matthias De Smet) * Add "sclen" filter expression keyword documentation. (PR#1661, see also htslib#1441) * Extend FILTER EXPRESSION man page section to match the changes made in HTSlib. (PR#1687, samtools/htslib#1478) Non user-visible changes and build improvements: * Ensure generated test files are ignored (by git) and cleaned (by make testclean) (PR#1692, Thanks to John Marshall) ------------------------------------------------------------------------------ bcftools - changes v1.16 ------------------------------------------------------------------------------ * New plugin `bcftools +variant-distance` to annotate records with distance to the nearest variant (#1690) Changes affecting the whole of bcftools, or multiple commands: * The -i/-e filtering expressions - Added support for querying of multiple filters, for example `-i 'FILTER="A;B"'` can be used to select sites with two filters "A" and "B" set. See the documentation for more examples. - Added modulo arithmetic operator Changes affecting specific commands: * bcftools annotate - A bug introduced in 1.14 caused that records with INFO/END annotation would incorrectly trigger `-c ~INFO/END` mode of comparison even when not explicitly requested, which would result in not transferring the annotation from a tab-delimited file (#1733) * bcftools merge - New `-m snp-ins-del` switch to merge SNVs, insertions and deletions separately (#1704) * bcftools mpileup - New NMBZ annotation for Mann-Whitney U-z test on number of mismatches within supporting reads - Suppress the output of MQSBZ and FS annotations in absence of alternate allele * bcftools +scatter - Fix erroneous addition of duplicate PG lines * bcftools +setGT - Custom genotypes (e.g. `-n c:1/1`) now correctly override ploidy -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Robert D. <rm...@sa...> - 2022-04-07 16:51:37
|
Samtools (and HTSlib and BCFtools) version 1.15.1 is now available from GitHub and SourceForge. This fixes bugs found in the 1.15 release. https://sourceforge.net/projects/samtools/ https://github.com/samtools/htslib/releases/tag/1.15.1 https://github.com/samtools/samtools/releases/tag/1.15.1 https://github.com/samtools/bcftools/releases/tag/1.15.1 The main changes are listed below: ------------------------------------------------------------------------------ htslib - changes v1.15.1 ------------------------------------------------------------------------------ * Security fix: Fixed broken error reporting in the sam_cap_mapq() function, due to a missing hts_log() parameter. Prior to this fix it was possible to abuse the log message format string by passing a specially crafted alignment record to this function. (PR#1406) * HTSlib now uses libhtscodecs release 1.2.2. This fixes a number of bugs where invalid compressed data could trigger usage of uninitialised values. (PR#1416) * Fixed excessive memory used by multi-threaded SAM output on long reads. (Part of PR#1384) * Fixed a bug where tabix would misinterpret region specifiers starting at position 0. It will also now warn if the file being indexed is supposed to be 1-based but has positions less than or equal to 0. (PR#1411) * The VCF header parser will now issue a warning if it finds an INFO header with Type=Flag but Number not equal to 0. It will also ignore the incorrect Number so the flag can be used. (PR#1415) ------------------------------------------------------------------------------ samtools - changes v1.15.1 ------------------------------------------------------------------------------ Bug fixes: * A bug which prevented the samtools view --region-file (and the equivalent -M -L <file>) options from working in version 1.15 has been fixed. (#1617) * Fixed a crash triggered by using the samtools view -c/--count and --unmap options together. The --unmap option is now ignored in counting mode. (#1619) Documentation: * The consensus command was missing from the main samtools.1 manual page. It has now been added. (#1603) * Corrected instructions for reproducing the samtools stats "raw total sequences" count using samtools view -c. (#1620; reported by @krukanna) * Improved manual page formatting. (#1625; thanks to John Marshall) Non user-visible changes and build improvements: * Unnecessary #include lines have been removed from bam_plcmd.c. (#1607; thanks to John Marshall) ------------------------------------------------------------------------------ bcftools - changes v1.15.1 ------------------------------------------------------------------------------ * bcftools annotate - New `-H, --header-line` convenience option to pass a header line on command line, this complements the existing `-h, --header-lines` option which requires a file with header lines * bcftools csq - A list of consequence types supported by `bcftools csq` has been added to the manual page. (#1671) * bcftools +fill-tags - Extend generalized functions so that FORMAT tags can be filled as well, for example: bcftools +fill-tags in.bcf -o out.bcf -- \ -t 'FORMAT/DP:1=int(smpl_sum(FORMAT/AD))' - Allow multiple custom functions in a single run. Previously the program would silently go with the last one, assigning the same values to all (#1684) * bcftools norm - Fix an assertion failure triggered when a faulty VCF file with a '-' character in the REF allele was used with `bcftools norm --atomize`. This option now checks that the REF allele only includes the allowed characters A, C, G, T and N. (#1668) - Fix the loss of phasing in half-missing genotypes in variant atomization (#1689) * bcftools roh - Fix a bug that could result in an endless loop or incorrect AF estimate when missing genotypes are present and the `--estimate-AF -` option was used (#1687) * bcftools +split-vep - VEP fields with characters disallowed in VCF tag names by the specification (such as '-' in 'M-CAP') couldn't be queried. This has been fixed, the program now sanitizes the field names, replacing invalid characters with underscore (#1686) -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Robert D. <rm...@sa...> - 2022-02-21 15:14:39
|
Samtools (and HTSlib and BCFtools) version 1.15 is now available from GitHub and SourceForge. https://sourceforge.net/projects/samtools/ https://github.com/samtools/htslib/releases/tag/1.15 https://github.com/samtools/samtools/releases/tag/1.15 https://github.com/samtools/bcftools/releases/tag/1.15 The main changes are listed below: ------------------------------------------------------------------------------ htslib - changes v1.15 ------------------------------------------------------------------------------ Features and Updates -------------------- * Bgzip now has a --keep option to not remove the input file after compressing. (PR#1331) * Improved file format detection so some BED files are no longer detected as FASTQ or FASTA. (PR#1350, thanks to John Marshall) * Added xz (lzma), zstd and D4 formats to the file type detection functions. We don't actively support reading these data types, but function calls and htsfile can detect them. (PR#1340, thanks to John Marshall) * CRAM now also uses libdeflate for read-names if the libdeflate version is new enough (1.9 onwards). Previously we used zlib for this due to poor performance of libdeflate. This gives a slight speed up and reduction in file size. (PR#1383) * The VCF and BCF readers will now issue a warning if contig, INFO or FORMAT IDs do not match the formats described in the VCFv4.3 specification. Note that while the invalid names will mostly still be accepted, future updates will convert the warnings to errors causing files including invalid names to be rejected. (PR#1389) Build changes ------------- These are compiler, configuration and makefile based changes. * HTSlib now uses libhtscodecs release 1.2.1. * Improved support for compiling and linking against HTSlib with Microsoft Visual Studio. (PR#1380, #1377, #1375. Thanks to Aidan Bickford and John Marshall) * Various internal CI improvements. Bug fixes --------- * Fixed CRAM index queries for HTSJDK output (PR#1388, reported by Chris Norman). Note this also fixes writing CRAM writing, to match the specification (and HTSJDK), from version 3.1 onwards. * Fixed CRAM index queries when required-fields settings are selected to ignore CIGARs (PR#1372, reported by Giulio Genovese). * Unmapped but placed (having chr/pos) are now included in the BAM indices. (PR#1352, thanks to John Marshall) * CRAM now honours the filename##idx##index nomenclature for specifying non-standard index locations. (PR#1360, reported by Michael Cariaso) * Minor CRAM v1.0 read-group fix (PR#1349, thanks to John Marshall) * Permit .fa and .fq file type detection as synonyms for FASTA and FASTQ. (PR#1386). * Empty VCF format fields are now output ":.:" as instead of "::". (PR#1370) * Repeated bcf_sr_seek calls now work. (PR#1363, reported by Giulio Genovese) * Bcf_remove_allele_set now works on unpacked BCF records. (PR#1358, reported by Brent Pedersen). * The hts_parse_decimal() function used to read numbers in region lists is now better at rejecting non-numeric values. In particular it now rejects a lone 'G' instead of interpreting it as '0G', i.e. zero. (PR#1396, PR#1400, reported by SSSimon Yang; thanks to John Marshall). * Improve support for GPU issues listed by -Wdouble-promotion. (PR#1365, reported by David Seisert) * Fix example code in header file documentation. (PR#1381, Thanks to Aidan Bickford) ------------------------------------------------------------------------------ samtools - changes v1.15 ------------------------------------------------------------------------------ Notice: * Samtools mpileup VCF and BCF output (deprecated in release 1.9) has been removed. Please use bcftools mpileup instead. New work and changes: * Added "--min-BQ" and "--min-MQ" options to "depth". These match the equivalent long options found in "samtools mpileup" and gives a consistent way of specifying the base and mapping quality filters. (#1584; fixes #1580. Reported by Chang Y) * Improved automatic file type detection with "view -u" or "view -1". Setting either of these options would default to BAM format regardless of the usual automatic file type selection based on the file name. The defaults are now only used when the file name does not indicate otherwise. (#1582) * For "markdup" optical duplicate marking add regex options for custom coordinates. For the case of non standard read names (QNAME), add options to read the coordinates and, optionally, another part of the string to test for optical duplication. (#1558) * New "samtools consensus" subcommand for generating consensus from SAM, BAM or CRAM files based on the contents of the alignment records. The consensus is written as FASTA, FASTQ or as a pileup oriented format. The default FASTA/FASTQ output includes one base per non-gap consensus, with insertions with respect to the aligned reference being included and deletions removed. This could be used to compute a new reference from sequence assemblies to realign against. (#1557) * New "samtools view --fetch-pairs" option. This options retrieves pairs even when the mate is outside of the requested region. Using this option enables the multi-region iterator and a region to search must be specified. The input file must be an indexed regular file. (#1542) * Building on #1530 below, add a tview reflist for Goto. (#1539, thanks to Adam Blanchet) * Completion of references added to tview Goto. (#1530; thanks to Adam Blanchet) * New "samtools head" subcommand for conveniently displaying the headers of a SAM, BAM, or CRAM file. Without options, this is equivalent to `samtools view --header-only --no-PG` but more succinct and memorable. (#1517; thanks to John Marshall) Bug Fixes: * Free memory when stats fails to read the header of a file. (#1592; thanks to Mathias Schmitt) * Fixed empty field on unsupported aux tags in "mpileup --output-extra". Replaces the empty fields on unsupported aux tags with a '*'. (#1553; fixes #1544. Thanks to Adam Blanchet) * In mpileup, the --output-BP-5 and --output-BP are no longer mutually exclusive. This fixes the problem of output columns being switched. (#1540; fixes 1534. Reported by Konstantin Riege) * Fix for hardclip bug in ampliconclip. Odd length sequences resulted in random characters appearing in sequence. (#1538; fixes #1527. Reported by Ivana Mihalek) Documentation: * Improved mpileup documentation. (#1566; fixes #1564. Reported by Chang Y) * Fixed "samtools depth -J" documentation, which was reversed. (#1552; fixes #1549. Reported by Stephan Hutter) * Numerous minor man page fixes. (#1528, #1536, #1579, #1590. Thanks to John Marshall for some of these) Non user-visible changes and build improvements: * Replace CentOS test build with Rocky Linux. The CentOS Docker images that our test build depended on has stopped working. Switched to Rocky Linux as the nearest available equivalent. (#1589) * Fix missing autotools on Appveyor. Newer versions of msys2 removed autotools from their base-devel package. This is putting them back. (#1575) * Fixed bug detected by clang-13 with -Wformat-security. (#1553) * Switch to using splaysort in bam_lpileup. Improves speed and efficiency in "tview". (#1548; thanks to Adam Blanchet) ------------------------------------------------------------------------------ bcftools - changes v1.15 ------------------------------------------------------------------------------ * New `bcftools head` subcommand for conveniently displaying the headers of a VCF or BCF file. Without any options, this is equivalent to `bcftools view --header-only --no-version` but more succinct and memorable. * The `-T, --targets-file` option had the following bug originating in HTSlib code: when an uncompressed file with multiple columns CHR,POS,REF was provided, the REF would be interpreted as 0 gigabases (#1598) Changes affecting specific commands: * bcftools annotate - In addition to `--rename-annots`, which requires a file with name mappings, it is now possible to do the same on the command line `-c NEW_TAG:=OLD_TAG` - Add new option --min-overlap which allows to specify the minimum required overlap of intersecting regions - Allow to transfer ALT from VCF with or without replacement using bcftools annotate -a annots.vcf.gz -c ALT file.vcf.gz bcftools annotate -a annots.vcf.gz -c +ALT file.vcf.gz * bcftools convert - Revamp of `--gensample`, `--hapsample` and `--haplegendsample` family of options which includes the following changes: - New `--3N6` option to output/input the new version of the .gen file format, see https://www.cog-genomics.org/plink/2.0/formats#gen - Deprecate the `--chrom` option in favor of `--3N6`. A simple `cut` command can be used to convert from the new 3*M+6 column format to the format printed with `--chrom` (`cut -d' ' -f1,3-`). - The CHROM:POS_REF_ALT IDs which are used to detect strand swaps are required and must appear either in the "SNP ID" column or the "rsID" column. The column is autodetected for `--gensample2vcf`, can be the first or the second for `--hapsample2vcf` (depending on whether the `--vcf-ids` option is given), must be the first for `--haplegendsample2vcf`. * bcftools csq - Allow GFF files with phase column unset * bcftools filter - New `--mask`, `--mask-file` and `--mask-overlap` options to soft filter variants in regions (#1635) * bcftools +fixref - The `-m id` option now works also for non-dbSNP ids, i.e. not just `rsINT` - New `-m flip-all` mode for flipping all sites, including ambiguous A/T and C/G sites * bcftools isec - Prevent segfault on sites filtered with -i/-e in all files (#1632) * bcftools mpileup - More flexible read filtering using the options: --ls, --skip-all-set .. skip reads with all of the FLAG bits set --ns, --skip-any-set .. skip reads with any of the FLAG bits set --lu, --skip-all-unset .. skip reads with all of the FLAG bits unset --nu, --skip-any-unset .. skip reads with any of the FLAG bits unset The existing synonymous options will continue to function but their use is discouraged: --rf, --incl-flags Required flags: skip reads with mask bits unset --ff, --excl-flags Filter flags: skip reads with mask bits set * bcftools query - Make the `--samples` and `--samples-file` options work also in the `--list-samples` mode. Add a new `--force-samples` option which allows to proceed even when some of the requested samples are not present in the VCF (#1631) * bcftools +setGT - Fix a bug in `-t q -e EXPR` logic applied on FORMAT fields, sites with all samples failing the expression EXPR were incorrectly skipped. This problem affected only the use of `-e` logic, not the `-i` expressions (#1607) * bcftools sort - make use of the TMPDIR environment variable when defined * bcftools +trio-dnm2 - The --use-NAIVE mode now also adds the de novo allele in FORMAT/VA -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Robert D. <rm...@sa...> - 2021-10-22 15:08:27
|
Samtools (and HTSlib and BCFtools) version 1.14 is now available from GitHub and SourceForge. https://sourceforge.net/projects/samtools/ https://github.com/samtools/htslib/releases/tag/1.14 https://github.com/samtools/samtools/releases/tag/1.14 https://github.com/samtools/bcftools/releases/tag/1.14 The main changes are listed below: ------------------------------------------------------------------------------ htslib - changes v1.14 ------------------------------------------------------------------------------ Features and Updates -------------------- * Added a keep option to bgzip to leave the original file untouched. This brings bgzip into line with gzip. (PR #1331, thanks to Alex Petty) * "endpos" has been added to the filter language, giving the position of the rightmost mapped base as measured by the CIGAR string. For unmapped reads it is the same as "pos". (PR #1307, thanks to John Marshall) * Interfaces have been added to interpret the new base modification tags added to the SAMtags document in samtools/hts-specs#418. (PR #1132) * New API functions hts_flush()/sam_flush()/bcf_flush() for flushing output htsFile/samFile/vcfFile streams. (PR #1326, thanks to John Marshall) * The synced_bcf_reader now sorts lines with symbolic alleles by END tag as well as POS. (PR #1321) * Added synced_bcf_reader options BCF_SR_REGIONS_OVERLAP and BCF_SR_TARGETS_OVERLAP for better control of records that start outside the desired region but overlap it are handled. Fixes samtools/bcftools#1420 and samtools/bcftools#1421 raised by John Marshall. (PR #1327) * HTSlib will now accept long-cigar CG:B: tags made by htsjdk which don't quite follow the specification properly (using signed values instead of unsigned). Thanks to Colin Diesh for reporting an example file. (PR #1317) * The warning printed when the BGZF reader finds a file with no EOF block has been changed to be less alarming. Unfortunately some third-party BGZF encoders don't write EOF blocks at the end of files. Thanks to Keiran Raine for reporting an example file. (PR #1323) * The FASTA and FASTQ readers get an option to skip over the first item on the header line, and use the second as the read name. It allows the original name to be restored on some of the fastq files served from the European Nucleotide Archive (ENA). (PR #1325) * HTSlib is now more strict when parsing the VCF samples line (beginning #CHROM). It will only accept tabs between the mandatory field names and sample names must be separated with tabs. (PR #1328) * HTSlib will now warn if it looks like the header has been corrupted by diagnostic messages from the program that made it. This can happen when using `nohup`, which by default mixes stdout and stderr into the same stream. (PR#1339, thanks to John Marshall) * File format detection will now recognise signatures for XZ, Zstd and D4 files (note that HTSlib will not read them yet). (PR #1340, thanks to John Marshall) Build changes ------------- These are compiler, configuration and makefile based changes. * Some redundant tests have been removed from the test harness, speeding it up. (PR #1308) * The version.sh script now works better on shallow checkouts. (PR #1324) * A check-untracked Makefile target has been added to catch untracked files (mostly) left by the test harness. (PR #1324) Bug fixes --------- * Fixed a case where flushing the thread pool could very occasionally cause a deadlock. (PR #1309) * Fixed a bug where some CRAM files could fail to decode if the required_fields option was in use. Thanks to Matt Sexton for reporting the issue. (PR #1314, fixes samtools/samtools#1475) * Fixed a regression where the S3 plugin could not read public files unless you supplied some Amazon credentials. Thanks to Chris Saunders for reporting. (PR #1332, fixes samtools/samtools#1491) * Fixed a possible CRAM thread deadlock discovered by @ryancaicse. (PR #1330, fixes #1329) * Some set-but-unused variables have been removed. (PR #1334) * Fixed a bug which prevented "flag.read2" from working in the filter language unless it was at the end of the expression. Thanks to Vamsi Kodali for reporting the issue. (PR #1342) * Fixed a memory leak that could happen if CRAM fails to inflate a LZMA block. (PR #1340, thanks to John Marshall) ------------------------------------------------------------------------------ samtools - changes v1.14 ------------------------------------------------------------------------------ Notice: * Samtools mpileup VCF and BCF output (deprecated in release 1.9) will be removed in the next release. Please use bcftools mpileup instead. New work and changes: * The legacy samtools API (libbam.a, bam_endian.h, sam.h and most of bam.h) has been removed. We recommend coding against the HTSlib API directly. The legacy API had not been actively maintained since 2015. (#1483) * New "samtools samples" command to list the samples used in a SAM/BAM/CRAM file. (#1432; thanks to Pierre Lindenbaum) * "mpileup" now supports base modifications via the SAM Mm/MM auxiliary tag. Please see the "--output-mods" option. (#1311) * Added "mpileup --output-BP-5" option to output the BP field in 5' to 3' order instead of left to right. (#1484; fixes #1481) * Added "samtools view --rf" option as an additional FLAG filtering method. This keeps records only if (FLAG & N) != 0. (#1508; fixes #1470) * New "samtools import -N" option to use the second word on a FASTQ header line, matching the SRA/ENA FASTQ variant. (#1485) * Improve "view -x" option to simplify specifying multiple tags, and added the reverse "--keep-tag" option to include rather than exclude. (#516) * Switched the processing order of "view" -x (tag filtering) and -e (expression) handling. Expressions now happen first so we can filter on tags which are about to be deleted. This is now consistent with the "view -d" behaviour too. (#1480; fixes #1476. Reported by William Rowell) * Added filter expression "endpos" keyword. (#1464. Thanks to John Marshall) * "samtools view" errors now appear after any SAM output, improving their visibility. (#1490. Thanks to John Marshall) * Improved "samtools sort" use of temporary files, both tidying up if it fails and recovery when facing pre-existing temporary files. (#1510; fixes #1035, #1503. Reported by Vivek Rai and Maarten Kooyman) * Filtering in "samtools markdup" now sets the UNMAP BAM flag when given the "-p" option. (#1512; fixes #1469) * Make CRAM references shared during "samtools merge" so merging many files has a lower memory usage. (#471) Bug fixes: * Prevent "samtools depth" from closing stdout when outputting to terminal, avoiding a bad interaction with PySam. (#1465. Thanks to John Marshall) * In-place "samtools reheader" now works on CRAMs produced using a higher than default compression level. (#1479) * Fix setting of the dt tag in "markdup". Optical duplicates were being marked too early, negating the tagging and counting elsewhere. (#1487; fixes #1486. Reported by Kevin Lewis) * Reinstate the "samtools stats -I" option to filter by sample. (#1496; fixes #1489. Reported by Matthias Bernt) * Fix "samtools fastq" handling of dual index tags on single-ended input. (#1474) * Improve "samtools coverage" documentation. (#1521; fixes #1504. Reported by Peter Menzel) Non user-visible changes and build improvements: * Replace Curses mvprintw() with va_list-based equivalent. (#1509. Thanks to John Marshall and Andreas Tille) * Fixed some clang-13 warning messages. (#1506) * Improve quoting of options in "samtools import" tests. (#1466. Thanks to John Marshall) * Fixed a faulty test which caused test harness failures on NetBSD. (#1520) ------------------------------------------------------------------------------ bcftools - changes v1.14 ------------------------------------------------------------------------------ Changes affecting the whole of bcftools, or multiple commands: * New `--regions-overlap` and `--targets-overlap` options which address a long-standing design problem with subsetting VCF files by region. BCFtools recognize two sets of options, one for streaming (`-t/-T`) and one for index-gumping (`-r/-R`). They behave differently, the first includes only records with POS coordinate within the regions, the other includes overlapping regions. The two new options allow to modify the default behaviour, see the man page for more details. * The `--output-type` option can be used to override the default compression level Changes affecting specific commands: * bcftools annotate - when `--set-id` and `--remove` are combined, `--set-id` cannot use tags deleted by `--remove`. This is now detected and the program exists with an informative error message instead of segfaulting (#1540) - while non-symbolic variation are uniquely identified by POS,REF,ALT, symbolic alleles starting at the same position were indistinguishable. This prevented correct matching of records with the same positions and variant type but different length given by INFO/END (samtools/htslib@60977f2). When annotating from a VCF/BCF, the matching is done automatically. When annotating from a tab-delimited text file, this feature can be invoked by using `-c INFO/END`. - add a new '.' modifier to control whether missing values should be carried over from a tab-delimited file or not. For example: -c TAG .. adds TAG if the source value is not missing. If TAG exists in the target file, it will be overwritten -c .TAG .. adds TAG even if the source value is missing. This can overwrite non-missing values with a missing value and can create empty VCF fields (`TAG=.`) * bcftools +check-ploidy - by default missing genotypes are not used when determining ploidy. With the new option `-m, --use-missing` it is possible to use the information carried in the missing and half-missing genotypes (e.g. ".", "./." or "./1") * bcftools concat - new `--ligate-force` and `--ligate-warn` options for finer control of `-l, --ligate` behaviour in imperfect overlaps. The new default is to throw an error when sites present in one chunk but absent in the other are encountered. To drop such sites and proceed, use the new `--ligate-warn` option (previously this was the default). To keep such sites, use the new `--ligate-force` option (#1567). * bcftools consensus: - Apply mask even when the VCF has no notion about the chromosome. It was possible to encounter this problem when `contig` lines were not present in the VCF header and no variants were called on that chromosome (#1592) * bcftools +contrast: - support for chunking within map/reduce framework allowing to collect NASSOC counts even for empty case/control sample sets (#1566) * bcftools csq: - bug fix, compound indels were not recognised in some cases (#1536) - compound variants were incorrectly marked as 'inframe' even when stop codon would occur before the frame was restored (#1551) - bug fix, FORMAT/BCSQ bitmasks could have been assigned incorrectly to some samples at multiallelic sites, a superset of the correct consequences would have been set (#1539) - bug fix, the upstream stop could be falsely assigned to all samples in a multi-sample VCF even if the stop was relevant for a single sample only (#1578) - further improve the detection of mismatching chromosome naming (e.g. "chrX" vs "X") in the GFF, VCF and fasta files * bcftools merge: - keep (sum) INFO/AN,AC values when merging VCFs with no samples (#1394) * bcftools mpileup: - new --indel-size option which allows to increase the maximum considered indel size considered, large deletions in long read data are otherwise lost. * bcftools norm: - atomization now supports Number=A,R string annotations (#1503) - assign as many alternate alleles to genotypes at multiallelic sites in the`-m +` mode, disregarding the phase. Previously the program assumed to be executed as an inverse operation of `-m -`, but when that was not the case, reference alleles would have been filled instead of multiple alternate alleles (#1542) * bcftools sort: - increase accuracy of the --max-mem option limit, previously the limit could be exceeded by more than 20% (#1576) * bcftools +trio-dnm: - new `--with-pAD` option to allow processing of VCFs without FORMAT/QS. The existing `--ppl` option was changed to the analogous `--with-pPL` * bcftools view: - the functionality of the option --compression-level lost in 1.12 has been restored -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Anil K. <ani...@gs...> - 2021-09-17 15:50:05
|
Hello everyone, I have a situation where samtools flagstats for a BAM file which is already marked with duplicate with Picard produces the following: 253552402 + 0 in total (QC-passed reads + QC-failed reads) 132897348 + 0 secondary 0 + 0 supplementary 71809672 + 0 duplicates 247864536 + 0 mapped (97.76% : N/A) 120655054 + 0 paired in sequencing 60327527 + 0 read1 60327527 + 0 read2 114967188 + 0 properly paired (95.29% : N/A) 114967188 + 0 with itself and mate mapped 0 + 0 singletons (0.00% : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5) To determine PCR duplication rate from the above values, I have two options PCR duplication = 4th row / 1st row = 71809672 / 253552402 = 0.28 PCR duplication = 4th row / 9th row = 71809672 / 114967188 = 0.62 2nd calculation produces the duplication rate very close to what is reported in Picard's report *.est_lib_complex_metrics.txt. Makes sense to me! However, I wanted to understand if the first calculation has any meaning or its entirely wrong way of determining PCR duplications. Please advise me. Thanks! Anil GSK monitors email communications sent to and from GSK in order to protect GSK, our employees, customers, suppliers and business partners, from cyber threats and loss of GSK Information. GSK monitoring is conducted with appropriate confidentiality controls and in accordance with local laws and after appropriate consultation. |
From: Robert D. <rm...@sa...> - 2021-07-09 11:23:14
|
Samtools (and HTSlib and BCFtools) version 1.13 is now available from GitHub and SourceForge. https://sourceforge.net/projects/samtools/ https://github.com/samtools/htslib/releases/tag/1.13 https://github.com/samtools/samtools/releases/tag/1.13 https://github.com/samtools/bcftools/releases/tag/1.13 The main changes are listed below: ------------------------------------------------------------------------------ htslib - changes v1.13 ------------------------------------------------------------------------------ Features and Updates -------------------- * In case a PG header line has multiple ID tags supplied by other applications, the header API now selects the first one encountered as the identifying tag and issues a warning when detecting subsequent ID tags. (#1256; fixed samtools/samtools#1393) * VCF header reading function (vcf_hdr_read) no longer tries to download a remote index file by default. (#1266; fixes #380) * Support reading and writing FASTQ format in the same way as SAM, BAM or CRAM. Records read from a FASTQ file will be treated as unmapped data. (#1156) * Added GCP requester pays bucket access. Thanks to @indraniel. (#1255) * Made mpileup's overlap removal choose which copy to remove at random instead of always removing the second one. This avoids strand bias in experiments where the +ve and -ve strand reads always appear in the same order. (#1273; fixes samtools/bcftools#1459) * It is now possible to use platform specific BAQ parameters. This also selects long-read parameters for read lengths bigger than 1kb, which helps bcftools mpileup call SNPs on PacBio CCS reads. (#1275) * Improved bcf_remove_allele_set. This fixes a bug that stopped iteration over alleles prematurely, marks removed alleles as 'missing' and does automatic lazy unpacking. (#1288; fixes #1259) * Improved compression metrics for unsorted CRAM files. This improves the choice of codecs when handling unsorted data. (#1291) * Linear index entries for empty intervals are now initialised with the file offset in the next non-empty interval instead of the previous one. This may reduce the amount of data iterators have to discard before reaching the desired region, when the starting location is in a sequence gap. Thanks to @carsonh for reporting the issue. (#1286; fixes #486) * A new hts_bin_level API function has been added, to compute the level of a given bin in the binning index. (#1286) * Related to the above, a new API method, hts_idx_nseq, now returns the total number of contigs from an index. (#1295 and #1299) * Added bracket handling to bcf_hdr_parse_line, for use with ##META lines. Thanks to Alberto Casas Ortiz. (#1240) Build changes ------------- These are compiler, configuration and makefile based changes. * HTSlib now uses libhtscodecs release 1.1.1. * Added a curl/curl.h check to configure and improved INSTALL documentation on build options. Thanks to Melanie Kirsche and John Marshall. (#1265; fixes #1261) * Some fixes to address GCC 11.1 warnings. (#1280, #1284, #1285; fixes #1283) * Supports building HTSlib in a separate directory. Thanks to John Marshall. (#1277; fixes #231) * Supports building HTSlib on MinGW 32-bit environments. Thanks to John Marshall. (#1301) Bug fixes --------- * Fixed hts_itr_query() et al region queries: fixed bug introduced in HTSlib 1.12, which led to iterators producing very few reads for some queries (especially for larger target regions) when unmapped reads were present. HTSlib 1.11 had a related problem in which iterators would omit a few unmapped reads that should have been produced; cf #1142. Thanks to Daniel Cooke for reporting the issue. (#1281; fixes #1279) * Removed compressBound assertions on opening bgzf files. Thanks to Gurt Hulselmans for reporting the issue. (#1258; fixed #1257) * Duplicate sample name error message for a VCF file now only displays the duplicated name rather the entire same name list. (#1262; fixes samtools/bcftools#1451) * Fix to make samtools cat work on CRAMs again. (#1276; fixes samtools/samtools#1420) * Fix for a double memory free in SAM header creation. Thanks to @ihsinme. (#1274) * Prevent assert in bcf_sr_set_regions. Thanks to Dr K D Murray. (#1270) * Fixed crash in knet_open() etc stubs. Thanks to John Marshall. (#1289) * Fixed filter expression "cigar" on unmapped reads. Stop treating an empty CIGAR string as an error. Thanks to Chang Y for reporting the issue. (#1298, fixes samtools/samtools#1445) * Bug fixes in the bundled copy of htscodecs: - Fixed an uninitialized access in the name tokeniser decoder. (samtools/htscodecs#23) - Fixed a bug with name tokeniser and variable number of names per slice, causing it to incorrectly report an error on certain valid inputs. (samtools/htscodecs#24) ------------------------------------------------------------------------------ samtools - changes v1.13 ------------------------------------------------------------------------------ * Fixed samtools view FILE REGION, mpileup -r REGION, coverage -r REGION and other region queries: fixed bug introduced in 1.12, which led to region queries producing very few reads for some queries (especially for larger target regions) when unmapped reads were present. Thanks to @vinimfava (#1451), @JingGuo1997 (#1457) and Ramprasad Neethiraj (#1460) for reporting the respective issues. * Added options to set and clear flags to samtools view. Along with the existing remove aux tags this gives the ability to remove mark duplicate changes (part of #1358) (#1441) * samtools view now has long option equivalents for most of its single-letter options. Thanks to John Marshall. (#1442) * A new tool, samtools import, has been added. It reads one or more FASTQ files and converts them into unmapped SAM, BAM or CRAM. (#1323) * Fixed samtools coverage error message when the target region name is not present in the file header. Thanks to @Lyn16 for reporting it. (#1462; fixes #1461) * Made samtools coverage ASCII mode produce true ASCII output. Previously it would produce UTF-8 characters. (#1423; fixes #1419) * samtools coverage now allows setting the maximum depth, using the -d/--depth option. Also, the default maximum depth has been set to 1000000. (#1415; fixes #1395) * Complete rewrite of samtools depth. This means it is now considerably faster and does not need a depth limit to avoid high memory usage. Results should mostly be the same as the old command with the potential exception of overlap removal. (#1428; fixes #889, helps ameliorate #1411) * samtools flags now accepts any number of command line arguments, allowing multiple SAM flag combinations to be converted at once. Thanks to John Marshall. (#1401, fixes #749) * samtools ampliconclip, ampliconstats and plot-ampliconstats now support inputs that list more than one reference. (#1410 and #1417; fixes #1396 and #1418) * samtools ampliconclip now accepts the --tolerance option, which allows the user to set the number of bases within which a region is matched. The default is 5. (#1456) * Updated the documentation on samtools ampliconclip to be clearer about what it does. From a suggestion by Nathan S Watson-Haigh. (#1448) * Fixed negative depth values in ampliconstats output. (#1400) * samtools addreplacerg now allows for updating (replacing) an existing `@RG` line in the output header, if a new `@RG` line is provided in the command line, via the -r argument. The update still requires the user's approval, which can be given with the new -w option. Thanks to Chuang Yu. (#1404) * Stopped samtools cat from outputting multiple CRAM EOF markers. (#1422) * Three new counts have been added to samtools flagstat: primary, mapped primary and duplicate primary. (#1431; fixes #1382) * samtools merge now accepts a `-o FILE` option specifying the output file, similarly to most other subcommands. The existing way of specifying it (as the first non-option argument, alongside the input file arguments) remains supported. Thanks to David McGaughey and John Marshall. (#1434) * The way samtools merge checks for existing files has been changed so that it does not hang when used on a named pipe. (#1438; fixes #1437) * Updated documentation on mpileup to highlight the fact that the filtering options on FLAGs work with ANY rules. (#1447; fixes #1435) * samtools can now be configured to use a copy of HTSlib that has been set up with separate build and source trees. When this is the case, the `--with-htslib` configure option should be given the location of the HTSlib build tree. (Note that samtools itself does not yet support out-of-tree builds). Thanks to John Marshall. (#1427; companion change to samtools/htslib#1277) ------------------------------------------------------------------------------ bcftools - changes v1.13 ------------------------------------------------------------------------------ This release brings new options and significant changes in BAQ parametrization in `bcftools mpileup`. The previous behaviour can be triggered by providing the `--config 1.12` option. Please see PR #1474 for details. Changes affecting the whole of bcftools, or multiple commands: * Improved build system Changes affecting specific commands: * bcftools annotate: - Fix rare a bug when INFO/END is present, all INFO fields are removed with `bcftools annotate -x INFO` and BCF output is produced. Then the removed INFO/END continues to inform the end coordinate and causes incorrect retrieval of records with the -r option (#1483) - Support for matching annotation line by ID, in addition to CHROM,POS,REF, and ALT (#1461) bcftools annotate -a annots.tab.gz -c CHROM,POS,~ID,REF,ALT,INFO/END input.vcf * bcftools csq: - When GFF and VCF/fasta use a different chromosome naming convention (e.g. chrX vs X), no consequences would be added. Newly the program attempts to detect these differences and remove/add the "chr" prefix to chromosome name to match the GFF and VCF/fasta (#1507) - Parametrize brief-predictions parameter to allow explicit number of amino acids to be printed. Note that the `-b, --brief-predictions` option is being replaced with `-B, --trim-protein-seq INT` * bcftools +fill-tags: - Generalization and better support for custom functions that allow adding new INFO tags based on arbitrary `-i, --include` type of expressions. For example, to calculate a missing INFO/DP annotation from FORMAT/AD, it is possible to use: -t 'DP:1=int(sum(FORMAT/AD))' Here the optional ":1" part specifies that a single value will be added (by default Number=. is used) and the optional int(...) adds an integer value (by default Type=Float is used). - When FORMAT/GT is not present, the INFO/AF tag will be newly calculated from INFO/AC and INFO/AN. * bcftools gtcheck: - Switch between FORMAT/GT or FORMAT/PL when one is (implicitly) requested but only the other is available - Improve diagnostics, printing warnings when a line cannot be matched and the number of lines skipped for various reasons (#1444) - Minor bug fix, with PLs being the default, the `--distinctive-sites` option started to require explicit `--error-probability 0` * bcftools index: - The program now accepts both data file name and the index file name. This adds to user convenience when running index statistics (-n, -s) * bcftools isec: - Always generate sites.txt with isec -p (#1462) * bcftools +mendelian: - Consider only complete trios, do not crash on sample name typos (#1520) * bcftools mpileup: - New `--seed` option for reproducibility of subsampling code in HTSlib - The SCR annotation which shows the number of soft-clipped reads now correctly pools reads together regardless of the variant type. Previously only reads with indels were included at indel sites. - Major revamp of BAQ. Please see https://github.com/samtools/bcftools/pull/1474 for details. The previous behaviour can be triggered by providing the `--config 1.12` option. - Thanks to improvements in HTSlib, the removal of overlapping reads (which can be disabled with the `-x, --ignore-overlaps` options) is not systematically biased anymore (https://github.com/samtools/htslib/pull/1273) - Modified scale of Mann-Whitney U tests. Newly INFO/*Z annotations will be printed, for example MQBZ replaces MQB. * bcftools norm: - Fix Type=Flag output in `norm --atomize` (#1472) - Atomization must not discard ALT=. records - Atomization of AD and QS tags now correctly updates occurrences of duplicate alleles within different haplotypes - Fix a bug in atomization of Number=A,R tags * bcftools reheader: - Add `-T, --temp-prefix` option * bcftools +setGT: - A wider range of genotypes can be set by the plugin by allowing specifying custom genotypes. For example, to force a heterozygous genotype it is now possible to use expressions like: c:'m|M' c:0/1 c:0 * bcftools +split-vep: - New `-u, --allow-undef-tags` option - Better handling of ambiguous keys such as INFO/AF and CSQ/AD. The `-p, --annot-prefix` option is now applied before doing anything else which allows its use with `-f, --format` and `-c, --columns` options. - Some consequence field names may not constitute a valid tag name, such as "pos(1-based)". Newly field names are trimmed to exclude brackets. * bcftools +tag2tag: - New --QR-QA-to-QS option to convert annotations generated by Freebays to QS used by BCFtools * bcftools +trio-dnm: - Add support for sites with more than four alleles. Note that only the four most frequent alleles are considered, the model remains unchanged. Previously such sites were skipped. - New --use-NAIVE option for a naive DNM calling based solely on FORMAT/GT and expected Mendelian inheritance. This option is suitable for prefiltering. - Fix behaviour to match the documentation, the `--dnm-tag DNG` option now correctly outputs log scaled values by default, not phred scaled. - Fix bug in VAF calculation, homozygous de novo variants were incorrectly reported as having VAF=50% - Fix arithmetic underflow which could lead to imprecise scores and improve sensitivity in high coverage regions - Allow combining --pn and --pns to set the noise trehsholds independently -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Anas J. <ana...@gm...> - 2021-06-29 20:39:06
|
I want to convert multiple cram files to fastq files at once. How can I do it with the sam tool? Can I use the sam tool in windows? |
From: Robert D. <rm...@sa...> - 2021-03-17 16:29:43
|
Samtools (and HTSlib and BCFtools) version 1.12 is now available from GitHub and SourceForge. https://sourceforge.net/projects/samtools/ https://github.com/samtools/htslib/releases/tag/1.12 https://github.com/samtools/samtools/releases/tag/1.12 https://github.com/samtools/bcftools/releases/tag/1.12 The main changes are listed below: ------------------------------------------------------------------------------ htslib - changes v1.12 ------------------------------------------------------------------------------ Features and Updates -------------------- * Added experimental CRAM 3.1 and 4.0 support. (#929) These should not be used for long term data storage as the specification still needs to be ratified by GA4GH and may be subject to changes in format. (This is highly likely for 4.0). However it may be tested using: test/test_view -t ref.fa -C -o version=3.1 in.bam -p out31.cram For smaller but slower files, try varying the compression profile with an additional "-o small". Profile choices are fast, normal, small and archive, and can be applied to all CRAM versions. * Added a general filtering syntax for alignment records in SAM/BAM/CRAM readers. (#1181, #1203) An example to find chromosome spanning read-pairs with high mapping quality: 'mqual >= 30 && mrname != rname' To find significant sized deletions: 'cigar =~ "[0-9]{2}D"' or 'rlen - qlen > 10'. To report duplicates that aren't part of a "proper pair": 'flag.dup && !flag.proper_pair' More details are in the samtools.1 man page under "FILTER EXPRESSIONS". * The knet networking code has been removed. It only supported the http and ftp protocols, and a better and safer alternative using libcurl has been available since release 1.3. If you need access to ftp:// and http:// URLs, HTSlib should be built with libcurl support. (#1200) * The old htslib/knetfile.h interfaces have been marked as deprecated. Any code still using them should be updated to use hFILE instead. (#1200) * Added an introspection API for checking some of the capabilities provided by HTSlib. (#1170) Thanks also to John Marshall for contributions. (#1222) - `hfile_list_schemes`: returns the number of schemes found - `hfile_list_plugins`: returns the number of plugins found - `hfile_has_plugin`: checks if a specific plugin is available - `hts_features`: returns a bit mask with all available features - `hts_test_feature`: test if a feature is available - `hts_feature_string`: return a string summary of enabled features * Made performance improvements to `probaln_glocal` method, which speeds up mpileup BAQ calculations. (#1188) - Caching of reused loop variables and removal of loop invariants - Code reordering to remove instruction latency. - Other refactoring and tidyups. * Added a public method for constructing a BAM record from the component pieces. Thanks to Anders Kaplan. (#1159, #1164) * Added two public methods, `sam_parse_cigar` and `bam_parse_cigar`, as part of a small CIGAR API (#1169, #1182). Thanks to Daniel Cameron for input. (#1147) * HTSlib, and the included htsfile program, will now recognise the old RAZF compressed file format. Note that while the format is detected, HTSlib is unable to read it. It is recommended that RAZF files are uncompressed with `gunzip` before using them with HTSlib. Thanks to John Marshall (#1244); and Matthew J. Oldach who reported problems with uncompressing some RAZF files (samtools/samtools#1387). * The S3 plugin now has options to force the address style. It will recognise the addressing_style and host_bucket entries in the respective aws .credentials and s3cmd .s3cfg files. There is also a new HTS_S3_ADDRESS_STYLE environment variable. Details are in the htslib-s3-plugin.7 man file (#1249). Build changes ------------- These are compiler, configuration and makefile based changes. * Added new Makefile targets for the applications that embed HTSlib and want to run its test suite or clean its generated artefacts. (#1230, #1238) * The CRAM codecs are now obtained via the htscodecs submodule, hence when cloning it is now best to use "git clone --recursive". In an existing clone, you may use "git submodule update --init" to obtain the htscodecs submodule checkout. * Updated CI test configuration to recurse HTSlib submodules. (#1359) * Added Cirrus-CI integration as a replacement for Travis, which was phased out. (#1175; #1212) * Updated the Windows image used by Appveyor to 'Visual Studio 2019'. (#1172; fixed #1166) * Fixed a buglet in configure.ac, exposed by the release 2.70 of autoconf. Thanks to John Marshall. (#1198) * Fixed plugin linking on macOS, to prevent symbol conflict when linking with a static HTSlib. Thanks to John Marshall. (#1184) * Fixed a clang++9 error in `cram_io.h`. Thanks to Pjotr Prins. (#1190) * Introduced $(ALL_CPPFLAGS) to allow for more flexibility in setting the compiler flags. Thanks to John Marshall. (#1187) * Added 'fall through' comments to prevent warnings issued by Clang on intentional fall through case statements, when building with `-Wextra flag`. Thanks to John Marshall. (#1163) * Non-configure builds now define _XOPEN_SOURCE=600 to allow them to work when the `gcc -std=c99` option is used. Thanks to John Marshall. (#1246) Bug fixes --------- * Fixed VCF `#CHROM` header parsing to only separate columns at tab characters. Thanks to Sam Morris for reporting the issue. (#1237; fixed samtools/bcftools#1408) * Fixed a crash reported in `bcf_sr_sort_set`, which expects REF to be present. (#1204; fixed samtools/bcftools#1361) * Fixed a bcf synced reader bug when filtering with a region list, and the first record for a chromosome had the same position as the last record for the previous chromosome. (#1254; fixed samtools/bcftools#1441) * Fixed a bug in the overlapping logic of mpileup, dealing with iterating over CIGAR segments. Thanks to `@wulj2` for the analysis. (#1202; fixed #1196) * Fixed a tabix bug that prevented setting the correct number of lines to be skipped in a region file. Thanks to Jim Robinson for reporting it. (#1189; fixed #1186) * Made `bam_itr_next` an alias for `sam_itr_next`, to prevent it from crashing when working with htsFile pointers. Thanks to Torbjörn Klatt for reporting it. (#1180; fixed #1179) * Fixed once per outgoing multi-threaded block `bgzf_idx_flush` assertion, to accommodate situations when a single record could span multiple blocks. Thanks to `@lacek`. (#1168; fixed samtools/samtools#1328) * Fixed assumption of pthread_t being a non-structure, as permitted by POSIX. Thanks also to John Marshall and Anders Kaplan. (#1167, #1153, #1153) * Fixed the minimum offset of a BAI index bin, to account for unmapped reads. Thanks to John Marshall for spotting the issue. (#1158; fixed #1142) * Fixed the CRLF handling in `sam_parse_worker` method. Thanks to Anders Kaplan. (#1149; fixed #1148) * Included unistd.h and errno.h directly in HTSlib files, as opposed to including them indirectly, via third party code. Thanks to Andrew Patterson (#1143) and John Marshall (#1145). ------------------------------------------------------------------------------ samtools - changes v1.12 ------------------------------------------------------------------------------ * The legacy samtools API (libbam.a, bam.h, sam.h, etc) has not been actively maintained since 2015. It is deprecated and will be removed entirely in a future SAMtools release. We recommend coding against the HTSlib API directly. * I/O errors and record parsing errors during the reading of SAM/BAM/CRAM files are now always detected. Thanks to John Marshall (#1379; fixed #101) * New make targets have been added: check-all, test-all, distclean-all, mostlyclean-all, testclean-all, which allow SAMtools installations to call corresponding Makefile targets from embedded HTSlib installations. * samtools --version now displays a summary of the compilation details and available features, including flags, used libraries and enabled plugins from HTSlib. As an alias, `samtools version` can also be used. (#1371) * samtools stats now displays the number of supplementary reads in the SN section. Also, supplementary reads are no longer considered when splitting read pairs by orientation (inward, outward, other). (#1363) * samtools stats now counts only the filtered alignments that overlap target regions, if any are specified. (#1363) * samtools view now accepts option -N, which takes a file containing read names of interest. This allows the output of only the reads with names contained in the given file. Thanks to Daniel Cameron. (#1324) * samtools view -d option now works without a tag associated value, which allows it to output all the reads with the given tag. (#1339; fixed #1317) * samtools view -d and -D options now accept integer and single character values associated with tags, not just strings. Thanks to `@dariome` and Keiran Raine for the suggestions. (#1357, #1392) * samtools view now works with the filtering expressions introduced by HTSlib. The filtering expression is passed to the program using the specific option -e or the global long option --input-fmt-option. E.g. samtools view -e 'qname =~ "#49$" && mrefid != refid && refid != -1 && mrefid != -1' align.bam looks for records with query-name ending in `#49` that have their mate aligned in a different chromosome. More details can be found in the FILTER EXPRESSIONS section of the main man page. (#1346) * samtools markdup now benefits from an increase in performance in the situation when a single read has tens or hundreds of thousands of duplicates. Thanks to `@denriquez` for reporting the issue. (#1345; fixed #1325) * The documentation for samtools ampliconstats has been added to the samtools man page. (#1351) * A new FASTA/FASTQ sanitizer script (`fasta-sanitize.pl`) was added, which corrects the invalid characters in the reference names. (#1314) Thanks to John Marshall for the installation fix. (#1353) * The CI scripts have been updated to recurse the HTSlib submodules when cloning HTSlib, to accommodate for the CRAM codecs, which now reside in the htscodecs submodule. (#1359) * The CI integrations now include Cirrus-CI rather than Travis. (#1335; #1365) * Updated the Windows image used by Appveyor to 'Visual Studio 2019'. (#1333; fixed #1332) * Fixed a bug in samtools cat, which prevented the command from running in multi-threaded mode. Thanks to Alex Leonard for reporting the issue. (#1337; fixed #1336) * A couple of invalid CIGAR strings have been corrected in the test data. (#1343) * The documentation for `samtools depth -s` has been improved. Thanks to `@wulj2`. (#1355) * Fixed a `samtools merge` segmentation fault when it failed to merge header `@PG` records. Thanks to John Marshall. (#1394; reported by Kemin Zhou in #1393) * Ampliconclip and ampliconstats now guard against the BED file containing more than one reference (chromosome) and fail when found. Adding proper support for multiple references will appear later. (#1398) ------------------------------------------------------------------------------ bcftools - changes v1.12 ------------------------------------------------------------------------------ Changes affecting the whole of bcftools, or multiple commands: * The output file type is determined from the output file name suffix, where available, so the -O/--output-type option is often no longer necessary. * Make F_MISSING in filtering expressions work for sites with multiple ALT alleles (#1343) * Fix N_PASS and F_PASS to behave according to expectation when reverse logic is used (#1397). This fix has the side effect of `query` (or programs like `+trio-stats`) behaving differently with these expressions, operating now in site-oriented rather than sample-oriented mode. For example, the new behavior could be: bcftools query -f'[%POS %SAMPLE %GT\n]' -i'N_PASS(GT="alt")==1' 11 A 0/0 11 B 0/0 11 C 1/1 while previously the same expression would return: 11 C 1/1 The original mode can be mimicked by splitting the filtering into two steps: bcftools view -i'N_PASS(GT="alt")==1' | \ bcftools query -f'[%POS %SAMPLE %GT\n]' -i'GT="alt"' Changes affecting specific commands: * bcftools annotate: - New `--rename-annots` option to help fix broken VCFs (#1335) - New -C option allows to read a long list of options from a file to prevent very long command lines. - New `append-missing` logic allows annotations to be added for each ALT allele in the same order as they appear in the VCF. Note that this is not bullet proof. In order for this to work: - the annotation file must have one line per ALT allele - fields must contain a single value as multiple values are appended as they are and would break the correspondence between the alleles and values * bcftools concat: - Do not phase genotypes by mistake if they are not already phased with `-l` (#1346) * bcftools consensus: - New `--mask-with`, `--mark-del`, `--mark-ins`, `--mark-snv` options (#1382, #1381, #1170) - Symbolic <DEL> should have only one REF base. If there are multiple, take POS+1 as the first deleted base. - Make consensus work when the first base of the reference genome is deleted. In this situation the VCF record has POS=1 and the first REF base cannot precede the event. (#1330) * bcftools +contrast: - The NOVELGT annotation was previously not added when requested. * bcftools convert: - Make the --hapsample and --hapsample2vcf options consistent with each other and with the documentation. * bcftools call: - Revamp of `call -G`, previously sample grouping by population was not truly independent and could still be influenced by the presence of other sample groups. - Optional addition of INFO/PV4 annotation with `call -a INFO/PV4` - Remove generation of useless HOB and ICB annotation; use `+fill-tags -- -t HWE,ExcHet` instead - The `call -f` option was renamed to `-a` to (1) make it consistent with `mpileup` and (2) to indicate that it includes both INFO and FORMAT annotations, not just FORMAT as previously - Any sensible Number=R,Type=Integer annotation can be used with -G, such as AD or QS - Don't trim QUAL; although usefulness of this change is questionable for true probabilistic interpretation (such high precision is unrealistic), using QUAL as a score rather than probability is helpful and permits more fine-grained filtering - Fix a suspected bug in `call -F` in the worst case, for certain improve readability - `call -C trio` is temporarily disabled * bcftools csq: - Fix a bug wich caused incorrect FORMAT/BCSQ formatting at sites with too many per-sample consequences - Fix a bug which incorrectly handled the --ncsq parameter and could clash with reserved BCF values, consequently producing truncated or even incorrect output of the %TBCSQ formatting expression in `bcftools query`. To account for the reserved values, the new default value is --ncsq 15 (#1428) * bcftools +fill-tags: - MAF definition revised for multiallelic sites, the second most common allele is considered to be the minor allele (#1313) - New FORMAT/VAF, VAF1 annotations to set the fraction of alternate reads provided FORMAT/AD is present * bcftools gtcheck: - support matching of a single sample against all other samples in the file with `-s qry:sample -s gt:-`. This was previously not possible, either full cross-check mode had to be run or a list of pairs/samples had to be created explicitly * bcftools merge: - Make `merge -R` behavior consistent with other commands and pull in overlapping records with POS outside of the regions (#1374) - Bug fix (#1353) * bcftools mpileup: - Add new optional tag `mpileup -a FORMAT/QS` * bcftools norm: - New `-a, --atomize` functionality to decompose complex variants, for example MNVs into consecutive SNVs - New option `--old-rec-tag` to indicate the original variant * bcftools query: - Incorrect fields were printed in the per-sample output when subset of samples was requested via -s/-S and the order of samples in the header was different from the requested -s/-S order (#1435) * bcftools +prune: - New options --random-seed and --nsites-per-win-mode (#1050) * bcftools +split-vep: - Transcript selection now works also on the raw CSQ/BCSQ annotation. - Bug fix, samples were dropped on VCF input and VCF/BCF output (#1349) * bcftools stats: - Changes to QUAL and ts/tv plotting stats: avoid capping QUAL to predefined bins, use an open-range logarithmic binning instead - plot dual ts/tv stats: per quality bin and cumulative as if threshold applied on the whole dataset * bcftools +trio-dnm2: - Major revamp of +trio-dnm plugin, which is now deprecated and replaced by +trio-dnm2. The original trio-dnm calling model used genotype likelihoods (PLs) as the input for calling. However, that is flawed because PLs make assumptions which are unsuitable for de novo calling: PL(RR) can become bigger than PL(RA) even when the ALT allele is present in the parents. Note that this is true also for other programs such as DeNovoGear which rely on the same samtools calculation. The new recommended workflow is: bcftools mpileup -a AD,QS -f ref.fa -Ou \ proband.bam father.bam mother.bam | \ bcftools call -mv -Ou | \ bcftools +trio-dnm -p proband,father,mother -Oz -o output.vcf.gz This new version also implements the DeNovoGear model. The original behavior of trio-dnm is no longer supported. For more details see http://samtools.github.io/bcftools/trio-dnm.pdf -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Robert D. <rm...@sa...> - 2020-09-22 13:35:18
|
Samtools (and HTSlib and BCFtools) version 1.11 is now available from GitHub and SourceForge https://sourceforge.net/projects/samtools/ https://github.com/samtools/htslib/releases/tag/1.11 https://github.com/samtools/samtools/releases/tag/1.11 https://github.com/samtools/bcftools/releases/tag/1.11 The main changes are listed below: ------------------------------------------------------------------------ htslib - changes v1.11 ------------------------------------------------------------------------ Features and Updates -------------------- * Support added for remote reference files. fai_path() can take a remote reference file and will return the corresponding index file. Remote indexes can be handled by refs_load_fai(). UR tags in @SQ lines can now be set to remote URIs. (#1017) * Added tabix --separate-regions option, which adds header comment lines separating different regions' output records when multiple target regions are supplied on the command line. (#1108) * Added tabix --cache option to set a BGZF block cache size. Most beneficial when the -R option is used and the same blocks need to be re-read multiple times. (#1053) * Improved error checking in tabix and added a --verbosity option so it is possible to change the amount of logging when it runs. (#1040) * A note about the maximum chromosome length usable with TBI indexes has been added to the tabix manual page. Thanks to John Marshall. (#1070) * New method vcf_open_mode() changes the opening mode of a variant file based on its file extension. Similar to sam_open_mode(). (#1096) * The VCF parser has been made faster and easier to maintain. (#1057) * bcf_record_check() has been made faster, giving a 15% speed increase when reading an uncompressed BCF file. (#1130) * The VCF parser now recognises the "<NON_REF>" symbolic allele produced by GATK. (#1045) * Support has been added for simultaneous reading of unindexed VCF/BCF files when using the synced_bcf_reader interface. Input files must have the chromosomes in the same order as each other and be consistent with the order of sequences in the header. (#1089) * The VCF and BCF readers will now attempt to fix up invalid INFO/END tags where the stored END value is less than POS, resulting in an apparently negative record length. Such files have been generated by programs which used END incorrectly, and by broken lift-over processes that failed to update any END tags present. (#1021; fixed samtools/bcftools#1154) * The htsFile interface can now detect the crypt4gh encrypted format (see https://samtools.github.io/hts-specs/crypt4gh.pdf). If HTSlib is built with external plug-in support, and the hfile_crypt4gh plug-in is present, the file will be passed to it for decryption. The plug-in can be obtained from https://github.com/samtools/htslib-crypt4gh. (#1046) * hts_srand48() now seeds the same POSIX-standard sequences of pseudo-random numbers regardless of platform, including on OpenBSD where plain srand48() produces a different cryptographically-strong non-deterministic sequence. Thanks to John Marshall. (#1002) * Iterators now work with 64 bit positions. (#1018) * Improved the speed of range queries when using BAI indexes by making better use of the linear index data included in the file. The best improvement is on low-coverage data. (#1031) * Alignments which consume no reference bases are now considered to have length 1. This would make such alignments cover 1 reference position in the same manner as alignments that are unmapped or have no CIGAR strings. These alignments can now be returned by iterator-based queries. Thanks to John Marshall. (#1063; fixed samtools/samtools#1240, see also samtools/hts-specs#521). * A bam_set_seqi() function to modify a single base in the BAM structure has been added. This is a companion function to bam_seqi(). (#1022) * Writing SAM format is around 30% faster. (#1035) * Added sam_format_aux1() which converts a BAM aux tag to a SAM format string. (#1134) * bam_aux_update_str() no longer requires NUL-terminated strings. It is also now possible to create tags containing part of a longer string. (#1088) * It is now possible to use external plug-ins in language bindings that dynamically load HTSlib. Note that a side-effect of this change is that some plug-ins now link against libhts.so, which means that they have to be able to find the shared library when they are started up. Thanks to John Marshall. (#1072) * bgzf_close(), and therefore hts_close(), will now return non-zero when closing a BGZF handle on which errors have been detected. (Part of #1117) * Added a special case to the kt_fisher_exact() test for when the table probability is too small to be represented in a double. This fixes a bug where it would, for some inputs, fail to correctly determine which side of the distribution the table was on resulting in swapped p-values being returned for the left- and right-tailed tests. The two-tailed test value was not affected by this problem. (#1126) * Improved error diagnostics in the CRAM decoder (#1042), BGZF (#1049), the VCF and BCF readers (#1059), and the SAM parser (#1073). * ks_resize() now allocates 1.5 times the requested size when it needs to expand a kstring instead of rounding up to the next power of two. This has been done mainly to make the inlined function smaller, but it also reduces the overhead of storing data in kstrings at the expense of possibly needing a few more reallocations. (#1129) CRAM improvements ----------------- * Delay CRAM crc32 checks until the data actually needs to be used. With other changes this leads to a 20x speed up in indexing and other sub-query based actions. (#988) * CRAM now handles the transition from mapped to unmapped data in a better way, improving compression of the unmapped data. (#961) * CRAM can now use libdeflate. (#961) * Fixed bug in MD tag generation with "b" read feature codes, causing the numbers in the tag to be too large. Note that HTSlib never uses this feature code so it is unlikely that this bug would be seen on real data. The problem was found when testing against hand-crafted CRAM files. (#1086) * Fixed a regression where the CRAM multi-region iterator became much less efficient when using threads. It now works more like the single iterator and does not preemptively decode the next container unless it will be used. (#1061) * Set CRAM default quality in lossy quality modes. If lossy quality is enabled and 'B', 'q' or 'Q' features are used, CRAM starts off with QUAL being all 255 (as per BAM spec and "*" quality) and then modifies individual qualities as dictated by the specific features. However that then produces ASCII quality " " (space, q=-1) for the unmodified bases. Instead ASCII quality "?" (q=30) is used, as per HTSJDK. Quality 255 is still used for sequences with no modifications at all. (#1094) Build changes ------------- These are compiler, configuration and makefile based changes. * `make all` now also builds htslib_static.mk and htslib-uninstalled.pc. Thanks to John Marshall. (#1011) * Various cppcheck-1.90 warnings have been fixed. (#995, #1011) * HTSlib now prefers its own headers when being compiled, fixing build failures on machines that already had a system-installed HTSlib. Thanks to John Marshall. (#1078; fixed #347) * Define HTSLIB_EXPORT without using a helper macro to reduce the length of compiler diagnostics that mention exported functions. Thanks to John Marshall. (#1029) * Fix dirty default build by including latest pkg.m4 instead of using aclocal.m4. Thanks to Damien Zammit. (#1091) * Struct tags have been added to htslib/*.h public typedefs. This makes it possible to forward declare htsFile without including htslib/hts.h. Thanks to Lucas Czech and John Marshall. (#1115; fixed #1106) * Fixed compiler warnings emitted by the latest gcc and clang releases when compiling HTSlib, along with some -Wextra warnings in the public include files. Thanks to John Marshall. (#1066, #1063, #1083) Bug fixes --------- * Fixed hfile_libcurl breakage when using libcurl 7.69.1 or later. Thanks to John Marshall for tracking down the exact libcurl change that caused the incompatibility. (#1105; fixed samtools/samtools#1254 and samtools/samtools#1284) * Fixed overflows kroundup32() and kroundup_size_t() which caused them to return zero when rounding up values where the most significant bit was set. When this happens they now return the highest value that can be stored (#1044). All of the kroundup macro definitions have also been gathered together into a unified implementation (#1051). * Fixed missing return parameter value in idx_test_and_fetch(). Thanks to Lilian Janin. (#1014) * Fixed crashes due to inconsistent selection between BGZF and plain (hFILE) interfaces when reading files. [fuzz] (#1019) * Added and/or fixed byte swapping code for big-endian platforms. Thanks to Jun Aruga, John Marshall, Michael R Crusoe and Gianfranco Costamagna for their help. (#1023; fixed #119 and #355) * Fixed a problem with multi-threaded on-the-fly indexes which would occasionally write virtual offsets pointing at the end of a BGZF block. Attempting to read from such an offset caused EOF to be incorrectly reported. These offsets are now handled correctly, and the indexer has been updated to avoid generating them. (#1028; fixed samtools/samtools#1197) * In sam_hdr_create(), free newly allocated SN strings when encountering an error. [fuzz] (#1034) * Prevent double free in case of idx_test_and_fetch() failure. Thanks to @fanwayne for the bug report. (#1047; fixed #1033) * In the header, link a new PG line only to valid chains. Prevents an explosive growth of PG lines on headers where PG lines are already present but not linked together correctly. (#1062; fixed samtools/samtools#1235) * Also in the header, when calling sam_hdr_update_line(), update target arrays only when the name or length is changed. (#1007) * Fixed buffer overflows in CRAM MD5 calculation triggered by files with invalid compression headers, or files with embedded references that were one byte too short. [fuzz] (#1024, #1068) * Fix mpileup regression between 1.9 and 1.10 where overlap detection was incorrectly skipped on reads where RNEXT, PNEXT and TLEN were set to the "unavailable" values ("*", 0, 0 in SAM). (#1097) * kputs() now checks for null pointer in source string. [fuzz] (#1087) * Fix potential bcf_update_alleles() crash on 0 alleles. Thanks to John Marshall. (#994) * Added bcf_unpack() calls to some bcf_update functions to fix a bug where updates made after a call to bcf_dup() could be lost. (#1032; fixed #1030) * Error message typo "Number=R" instead of "Number=G" fixed in bcf_remove_allele_set(). Thanks to Ilya Vorontsov. (#1100) * Fixed crashes that could occur in BCF files that use IDX= header annotations to create a sparse set of CHROM, FILTER or FORMAT indexes, and include records that use one of the missing index values. [fuzz] (#1092) * Fixed potential integer overflows in the VCF parser and ensured that the total length of FORMAT fields cannot go over 2Gbytes. [fuzz] (#1044, #1104) * Download index files atomically in idx_test_and_fetch(). This prevents corruption when running parallel jobs on S3 files. Thanks to John Marshall. (#1112; samtools/samtools#1242). * The pileup constructor callback is now given the copy of the bam1_t struct made by pileup instead of the original one passed to bam_plp_push(). This makes it the same as the one passed to the destructor and ensures that cached data, for example the location of an aux tag, will remain valid. (#1127) * Fixed possible error in code_sort() on negative CRAM Huffman code length. (#1008) * Fixed possible undefined shift in cram_byte_array_stop_decode_init(). (#1009) * Fixed a bug where range queries to the end of a given reference would return incorrect results on CRAM files. (#1016; fixed samtools/samtools#1173) * Fixed an integer overflow in cram_read_slice(). [fuzz] (#1026) * Fixed a memory leak on failure in cram_decode_slice(). [fuzz] (#1054) * Fixed a regression which caused cram_transcode_rg() to fail, resulting in a crash in "samtools cat" on CRAM files. (#1093; fixed samtools/samtools#1276) * Fixed an undersized string reallocation in the threaded SAM reader which caused it to crash when reading SAM files with very long lines. Numerous memory allocation checks have also been added. (#1117) ------------------------------------------------------------------------ samtools - changes v1.11 ------------------------------------------------------------------------ * New samtools ampliconclip sub-command for removing primers from amplicon-based sequencing experiments, including the current COVID-19 projects. The primers are listed in a BED file and can be either soft-clipped or hard-clipped. (#1219) * New samtools ampliconstats sub-command to produce a textual summary of primer and amplicon usage, in a similar style to "samtools stats". The misc/plot-ampliconstats script can generate PNG images based on this text report. (#1227) * Samtools fixmate, addreplacerg, markdup, ampliconclip and sort now accept a -u option to enable uncompressed output, which is useful when sending data over a pipe to another process. Other subcommands which already support this option for the same purpose are calmd, collate, merge, view and depad. (#1265) * samtools stats has a new GCT section, where it reports ACGT content percentages, similar to GCC but taking into account the read orientation. (#1274) * Samtools split now supports splitting by tag content with the -d option (#1211) * samtools merge now accepts a BED file as a command line argument (-L) and does the merging only with reads overlapping the specified regions (#1156) * Samtools sort now has a minhash collation (-M) to group unmapped reads with similar sequence together. This can sometimes significantly reduce the file size. (#1093) * Samtools bedcov now has -g and -G options to filter-in and filter-out based on the FLAG field. Also the new -d option adds an extra column per file counting the number of bases with a depth greater than or equal to a given threshold. (#1214) * Fixed samtools bedcov -j option (discard deletions and ref-skips) with multiple input files (#1212) * samtools bedcov will now accept BED files with columns separated by spaces as well as tabs (#1246; #1188 reported by Mary Carmack) * samtools depth can now include deletions (D) when computing the base coverage depth, if the user adds the -J option to the command line (#1163). * samtools depth will count only the bases of one read, for the overlapping section of a read pair, if the -s option is used in the command line (#1241, thanks to Teng Li). * samtools depth will now write zeros for the entire reference length, when "samtools depth -aa" is run on a file with no alignments. (#1252; #1249 reported by Paul Donovan) * Stopped depth from closing stdout, which triggered test fails in pysam (#1208, thanks to John Marshall). * samtools view now accepts remote URIs for FASTA and FAI files. Furthermore, the reference and index file can be provided in a single argument, such as samtools view -T ftp://x.com/ref.fa##idx##ftp://y.com/index.fa.fai a.cram (#1176; samtools/htslib#933 reported by @uitde007) * samtools faidx gets new options --fai-idx and --gzi-idx to allow specification of the locations of the .fai and (if needed) .gzi index files. (#1283) * The samtools fasta/fastq '-T' option can now add SAM array (type 'B') tags to the output header lines. (#1301) * samtools mpileup can now display MAPQ either as ASCII characters (with -s/--output-MQ; column now restored to its documented order as in 1.9 and previous versions) or comma-separated numbers (with --output-extra MAPQ; in SAM column order alongside other selected --output-extra columns). When both -s/--output-MQ and -O/--output-BP are used, samtools 1.10 printed the extra columns in the opposite order. This changes the format produced by 1.10's --output-extra MAPQ. (#1281, thanks to John Marshall; reported by Christoffer Flensburg) * samtools tview now accepts a -w option to set the output width in text mode (-d T). (#1280) * The dict command can now add AN tags containing alternative names with "chr" prefixes added to or removed from each sequence name as appropriate and listing both "M" and "MT" alternatives for mitochondria. (#1164, thanks to John Marshall) * The samtools import command, labelled as obsolete in May 2009 and removed from all help and documentation later that year, has finally been removed. Use samtools view instead. (#1185) * Replaced the remaining usage of the Samtools 0.1 legacy API with htslib calls. (#1187, thanks to John Marshall) * Documentation / help improvements (#1154; #1168; #1191; #1199; #1204; #1313): - Fixed a few man-page table layout issues - Added <file>##idx##<index> filename documentation - Fixed usage statement for samtools addreplacerg - Miscellaneous spelling and grammar fixes - Note fixmate/markdup name collated rather than name sorted input - Note that fastq and fasta inputs should also be name collated - Reshuffled order of main man-page and added -@ to more sub-pages - The misc/seq_cache_populate.pl script now gives REF_CACHE guidance * Additional documentation improvements, thanks to John Marshall (#1181; #1224; #1248; #1262; #1300) - Emphasise that samtools index requires a position-sorted file - Document 2^29 chromosome length limit in BAI indexes - Numerous typing, spelling and formatting fixes * Improved the message printed when samtools view fails to read its input (#1296) * Added build support for the OpenIndiana OS (#1165, thanks to John Marshall) * Fixed failing tests on OpenBSD (#1151, thanks to John Marshall) * The samtools sort tests now use less memory so the test suite works better on small virtual machines. (#1159) * Improved markdup's calculation of insert sizes (#1161) Also improved tests (#1150) and made it run faster when not checking for optical duplicates or adding 'do' tags (#1308) * Fixed samtools coverage minor inconsistency vs idxstats (#1205; #1203 reported by @calliza) * Fixed samtools coverage quality thresholding options which were the wrong way round compared to mpileup (-q is the mapping quality threshold and -Q is base quality). (#1279; #1278 reported by @kaspernie) * Fixed bug where `samtools fastq -i` would add two copies of the barcode in the fastq header if both reads in a pair had a "BC:Z" tag (#1309; #1307 reported by @mattsoup) * Samtools calmd no longer errors with a SEQ of "*" (#1230; #1229 reported by Bob Harris) * Samtools tview now honours $COLUMNS, fixing some CI tests (#1171; #1162 reported by @cljacobs) * Fixed a samtools depad overflow condition (#1200) * Improved curses detection in configure script (#1170, #577, #940) * Fixed samtools stats integer overflows and added support for long references (#1174; #1173) * Fixed a 1-byte undersized memory allocation in samtools merge. (#1302) ------------------------------------------------------------------------ bcftools - changes v1.11 ------------------------------------------------------------------------ Changes affecting the whole of bcftools, or multiple commands: * Filtering -i/-e expressions - Breaking change in -i/-e expressions on the FILTER column. Originally it was possible to query only a subset of filters, but not an exact match. The new behavior is: FILTER="A" .. exact match, for example "A;B" does not pass FILTER!="A" .. exact match, for example "A;B" does pass FILTER~"A" .. both "A" and "A;B" pass FILTER!~"A" .. neither "A" nor "A;B" pass - Fix in commutative comparison operators, in some cases reversing sides would produce incorrect results (#1224; #1266) - Better support for filtering on sample subsests - Add SMPL_*/S* family of functions that evaluate within rather than across all samples. (#1180) * Improvements in the build system Changes affecting specific commands: * bcftools annotate: - Previously it was not possible to use `--columns =TAG` with INFO tags and the `--merge-logic` feature was restricted to tab files with BEG,END columns, now extended to work also with REF,ALT. - Make `annotate -TAG/+TAG` work also with FORMAT fields. (#1259) - ID and FILTER can be transferred to INFO and ID can be populated from INFO. However, the FILTER column still cannot be populated from an INFO tag because all possible FILTER values must be known at the time of writing the header (#947; #1187) * bcftools consensus: - Fix in handling symbolic deletions and overlapping variants. (#1149; #1155; #1295) - Fix `--iupac-codes` crash on REF-only positions with `ALT="."`. (#1273) - Fix `--chain` crash. (#1245) - Preserve the case of the genome reference. (#1150) - Add new `-a, --absent` option which allows to set positions with no supporting evidence to "N" (or any other character). (#848; #940) * bcftools convert: - The option `--vcf-ids` now works also with `-haplegendsample2vcf`. (#1217) - New option `--keep-duplicates` * bcftools csq: - Add `misc/gff2gff.py` script for conversion between various flavors of GFF files. The initial commit supports only one type and was contributed by @flashton2003. (#530) - Add missing consequence types. (PR #1203; #1292) - Allow overlapping CDS to support ribosomal slippage. (#1208) * bcftools +fill-tags: - Added new annotations: INFO/END, TYPE, F_MISSING. * bcftools filter: - Make `--SnpGap` optionally filter also SNPs close to other variant types. (#1126) * bcftools gtcheck: - Complete revamp of the command. The new version is faster and allows N:M sample comparisons, not just 1:N or NxN comparisons. Some functionality was lost (plotting and clustering) but may be added back on popular demand. * bcftools +mendelian: - Revamp of user options, output VCFs with mendelian errors annotation, read PED files (thanks to Giulio Genovese). * bcftools merge: - Update headers when appropriate with the '--info-rules *:join' INFO rule. (#1282) - Local alleles merging that produce LAA and LPL when requested, a draft implementation of https://github.com/samtools/hts-specs/pull/434 (#1138) - New `--no-index` which allows to merge unindexed files. Requires the input files to have chromosomes in th same order and consistent with the order of sequences in the header. (PR #1253; samtools/htslib#1089) - Fixes in gVCF merging. (#1127; #1164) * bcftools norm: - Fixes in `--check-ref s` reference setting features with non-ACGT bases. (#473; #1300) - New `--keep-sum` switch to keep vector sum constant when splitting multiallelics. (#360) * bcftools +prune: - Extend to allow annotating with various LD metrics: r^2, Lewontin's D' (PMID:19433632), or Ragsdale's D (PMID:31697386). * bcftools query: - New `%N_PASS()` formatting expression to output the number of samples that pass the filtering expression. * bcftools reheader: - Improved error reporting to prevent user mistakes. (#1288) * bcftools roh: - Several fixes and improvements - the `--AF-file` description incorrectly suggested "REF\tALT" instead of the correct "REF,ALT". (#1142) - RG lines could have negative length. (#1144) - new `--include-noalt` option to allow also ALT=. records. (#1137) * bcftools scatter: - New plugin intended as a convenient inverse to `concat` (thanks to Giulio Genovese, PR #1249) * bcftools +split: - New `--groups-file` option for more flexibility of defining desired output. (#1240) - New `--hts-opts` option to reduce required memory by reusing one output header and allow overriding the default hFile's block size with `--hts-opts block_size=XXX`. On some file systems (lustre) the default size can be 4M which becomes a problem when splitting files with 10+ samples. - Add support for multisample output and sample renaming * bcftools +split-vep: - Add default types (Integer, Float, String) for VEP subfields and make `--columns -` extract all subfields into INFO tags in one go. -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Robert D. <rm...@sa...> - 2019-12-06 17:23:39
|
Samtools (and HTSlib and BCFtools) version 1.10 is now available from GitHub and SourceForge https://sourceforge.net/projects/samtools/ https://github.com/samtools/htslib/releases/tag/1.10 https://github.com/samtools/samtools/releases/tag/1.10 https://github.com/samtools/bcftools/releases/tag/1.10 The main changes are listed below: ------------------------------------------------------------------------------ htslib - changes v1.10 ------------------------------------------------------------------------------ Brief summary ------------- There are many changes in this release, so the executive summary is: * Addition of support for references longer than 2Gb (NB: SAM and VCF formats only, not their binary counterparts). This may need changes in code using HTSlib. See README.large_positions.md for more information. * Added a SAM header API. * Major speed up to SAM reading and writing. This also now supports multi-threading. * We can now auto-index on-the-fly while writing a file. This also includes to bgzipped SAM.gz. * Overhaul of the S3 interface, which now supports version 4 signatures. This also makes writing to S3 work. These also required some ABI changes. See below for full details. Features / updates ------------------ * A new SAM/BAM/CRAM header API has been added to HTSlib, allowing header data to be updated without having to parse or rewrite large parts of the header text. See htslib/sam.h for function definitions and documentation. (#812) The header typedef and several pre-existing functions have been renamed to have a sam_hdr_ prefix: sam_hdr_t, sam_hdr_init(), sam_hdr_destroy(), and sam_hdr_dup(). (The existing bam_hdr_-prefixed names are still provided for compatibility with existing code.) (#887, thanks to John Marshall) * Changes to hfile_s3, which provides support for the AWS S3 API. (#839) - hfile_s3 now uses version 4 signatures by default. Attempting to write to an S3 bucket will also now work correctly. It is possible to force version 2 signatures by creating environment variable HTS_S3_V2 (the exact value does not matter, it just has to exist). Note that writing depends on features that need version 4 signatures, so forcing version 2 will disable writes. - hfile_s3 will automatically retry requests where the region endpoint was not specified correctly, either by following the 301 redirect (when using path-style requests) or reading the 400 response (when using virtual-hosted style requests and version 4 signatures). The first region to try can be set by using the AWS_DEFAULT_REGION environment variable, by setting "region" in ".aws/credentials" or by setting "bucket_location" in ".s3cfg". - hfile_s3 now percent-escapes the path component of s3:// URLs. For backwards-compatibility it will ignore any paths that have already been escaped (detected by looking for '%' followed by two hexadecimal digits.) - New environment variables HTS_S3_V2, HTS_S3_HOST, HTS_S3_S3CFG and HTS_S3_PART_SIZE to force version-2 signatures, control the S3 server hostname, the configuration file and upload chunk sizes respectively. * Numerous SAM format improvements. - Bgzipped SAM files can now be indexed and queried. The library now recognises sam.gz as a format name to ease this usage. (#718, #916) - The SAM reader and writer now supports multi-threading via the thread-pool. (#916) Note that the multi-threaded SAM reader does not currently support seek operations. Trying to do this (for example with an iterator range request) will result in the SAM readers dropping back to single-threaded mode. - Major speed up of SAM decoding and encoding, by around 2x. (#722) - SAM format can now handle 64-bit coordinates and references. This has implications for the ABI too (see below). Note BAM and CRAM currently cannot handle references longer than 2Gb, however given the speed and threading improvements SAM.gz is a viable workaround. (#709) * We can now automatically build indices on-the-fly while writing SAM, BAM, CRAM, VCF and BCF files. (Note for SAM and VCF this only works when bgzipped.) (#718) * HTSlib now supports the @SQ-AN header field, which lists alternative names for reference sequences. This means given "@SQ SN:1 AN:chr1", tools like samtools can accept requests for "1" or "chr1" equivalently. (#931) * Zero-length files are no longer considered to be valid SAM files (with no header and no alignments). This has been changed so that pipelines such as `somecmd | samtools ...` with `somecmd` aborting before outputting anything will now propagate the error to the second command. (#721, thanks to John Marshall; #261 reported by Adrian Tan) * Added support for use of non-standard index names by pasting the data filename and index filename with ##idx##. For example "/path1/my_data.bam##idx##/path2/my_index.csi" will open bam file "/path1/my_data.bam" and index file "/path2/my_index.csi". (#884) This affects hts_idx_load() and hts_open() functions. * Improved the region parsing code to handle colons in reference names. Strings can be disambiguated by the use of braces, so for example when reference sequences called "chr1" and "chr1:100-200" are both present, the regions "{chr1}:100-200" and "{chr1:100-200}" unambiguously indicate which reference is being used. (#708) A new function hts_parse_region() has been added along with specialisations for sam_parse_region() and fai_parse_region(). * CRAM encoding now has additional checks for MD/NM validity. If they are incorrect, it stores the (incorrect copy) verbatim so round-trips "work". (#792) * Sped up decoding of CRAM by around 10% when the MD tag is being generated. (#874) * CRAM REF_PATH now supports %Ns (where N is a single digit) expansion in http URLs, similar to how it already supported this for directories. (#791) * BGZF now permits indexing and seeking using virtual offsets in completely uncompressed streams. (#904, thanks to Adam Novak) * bgzip now asks for extra confirmation before decompressing files that don't have a known compression extension (e.g. .gz). This avoids `bgzip -d foo.bam.bai` producing a foo.bam file that is very much not a BAM-formatted file. (#927, thanks to John Marshall) * The htsfile utility can now copy files (including to/from URLs using HTSlib's remote access facilities) with the --copy option, in addition to its existing uses of identifying file formats and displaying sequence or variant data. (#756, thanks to John Marshall) * Added tabix --min-shift option. (#752, thanks to Garrett Stevens) * Tabix now has an -D option to disable storing a local copy of a remote index. (#870) * Improved support for MSYS Windows compiler environment. (#966) * External htslib plugins are now supported on Windows. (#966) API additions and improvements ------------------------------ * New API functions bam_set_mempolicy() and bam_get_mempolicy() have been added. These allow more control over the ownership of bam1_t alignment record data; see documentation in htslib/sam.h for more information. (#922) * Added more HTS_RESULT_USED checks, this time for VCF I/O. (#805) * khash can now hash kstrings. This makes it easier to hash non-NUL-terminated strings. (#713) * New haddextension() filename extension API function. (#788, thanks to John Marshall) * New hts_resize() macro, designed to replace uses of hts_expand() and hts_expand0(). (#805) * Added way of cleaning up unused jobs in the thread pool via the new hts_tpool_dispatch3() function. (#830) * New API functions hts_reglist_create() and sam_itr_regarray() are added to create hts_reglist_t region lists from `chr:<from>-<to>` type region specifiers. (#836) * Ksort has been improved to facilitate library use. See KSORT_INIT2 (adds scope / namespace capabilities) and KSORT_INIT_STATIC interfaces. (#851, thanks to John Marshall) * New kstring functions (#879): KS_INITIALIZE - Initializer for structure assignment ks_initialize() - Initializer for pointed-to kstrings ks_expand() - Increase kstring capacity by a given amount ks_clear() - Set kstring length to zero ks_free() - Free the underlying buffer ks_c_str() - Returns the kstring buffer as a const char *, or an empty string if the length is zero. * New API functions hts_idx_load3(), sam_index_load3(), tbx_index_load3() and bcf_index_load3() have been added. These allow control of whether remote indexes should be cached locally, and allow the error message printed when the index does not exist to be suppressed. (#870) * Improved hts_detect_format() so it no longer assumes all text is SAM unless positively identified otherwise. It also makes a stab at detecting bzip2 format and identifying BED, FASTA and FASTQ files. (#721, thanks to John Marshall; #200, #719 both reported by Torsten Seemann) * File format errors now set errno to EFTYPE (BSD, MacOS) when available instead of ENOEXEC. (#721) * New API function bam_set_qname (#942) * In addition to the existing hts_version() function, which reflects the HTSlib version being used at runtime, <htslib/hts.h> now also provides HTS_VERSION, a preprocessor macro reflecting the HTSlib version that a program is being compiled against. (#951, thanks to John Marshall; #794) ABI changes ----------- This release contains a number of things which change the ApplicationBinary Interface (ABI). This means code compiled against an earlierlibrary will require recompiling. The shared library soversion hasbeen bumped. * On systems that support it, the default symbol visibility has been changed to hidden and the only exported symbols are ones that form part of the officially supported ABI. This is to make clear exactly which symbols are considered parts of the library interface. It also helps packagers who want to check compatibility between HTSlib versions. (#946; see for example issues #311, #616, and #695) * HTSlib now supports 64 bit reference positions. This means several structures, function parameters, and return values have been made bigger to allow larger values to be stored. While most code that uses HTSlib interfaces should still build after this change, some alterations may be needed - notably to printf() formats where the values of structure members are being printed. (#709) Due to file format limitations, large positions are only supported when reading and writing SAM and VCF files. See README.large_positions.md for more information. * An extra field has been added to the kbitset_t struct so bitsets can be made smaller (and later enlarged) without involving memory allocation. (#710, thanks to John Marshall) * A new field has been added to the bam_pileup1_t structure to keep track of which CIGAR operator is being processed. This is used by a new bam_plp_insertion() function which can be used to return the sequence of any inserted bases at a given pileup location. If the alignment includes CIGAR P operators, the returned sequence will include pads. (#699) * The hts_itr_t and hts_itr_multi_t structures have been merged and can be used interchangeably. Extra fields have been added to hts_itr_t to support this. hts_itr_multi_t is now a typedef for hts_itr_t; sam_itr_multi_next() is now an alias for sam_itr_next() and hts_itr_multi_destroy() is an alias for hts_itr_destroy(). (#836) * An improved regidx interface has been added. To allow this, struct reg_t has been removed, regitr_t has been modified and various new API functions have been added to htslib/regidx.h. While parts of the old regidx API have been retained for backwards compatibility, it is recommended that all code using regidx should be changed to use the new interface. (#761) * Elements in the hts_reglist_t structure have been reordered slightly so that they pack together better. (#761) * bgzf_utell() and bgzf_useek() now use type off_t instead of long for the offset. This allows them to work correctly on files longer than 2G bytes on Windows and 32-bit Linux. (#868) * A number of functions that used to return void now return int so that they can report problems like memory allocation failures. Callers should take care to check the return values from these functions. (#834) The affected functions are: ksort.h: ks_introsort(), ks_mergesort() sam.h: bam_mplp_init_overlaps() synced_bcf_reader.h: bcf_sr_regions_flush() vcf.h: bcf_format_gt(), bcf_fmt_array(), bcf_enc_int1(), bcf_enc_size(), bcf_enc_vchar(), bcf_enc_vfloat(), bcf_enc_vint(), bcf_hdr_set_version(), bcf_hrec_format() vcfutils.h: bcf_remove_alleles() * bcf_set_variant_type() now outputs VCF_OVERLAP for spanning deletions (ALT=*). (#726) * A new field (hrecs) has been added to the bam_hdr_t structure for use by the new header API. The old sdict field is now not used and marked as deprecated. The l_text field has been changed from uint32_t to size_t, to allow for very large headers in SAM files. The text and l_text fields have been left for backwards compatibility, but should not be accessed directly in code that uses the new header API. To access the header text, the new functions sam_hdr_length() and sam_hdr_str() should be used instead. (#812) * The old cigar_tab field is now marked as deprecated; use the new bam_cigar_table[] instead. (#891, thanks to John Marshall) * The bam1_core_t structure's l_qname and l_extranul fields have been rearranged and enlarged; l_qname still includes the extra NULs. (Almost all code should use bam_get_qname(), bam_get_cigar(), etc, and has no need to use these fields directly.) HTSlib now supports the SAM specification's full 254 QNAME length again. (#900, thanks to John Marshall; #520) * bcf_index_load() no longer tries the '.tbi' suffix when looking for BCF index files (.tbi indexes are for text files, not binary BCF). (#870) * htsFile has a new 'state' member to support SAM multi-threading. (#916) * A new field has been added to the bam1_t structure, and others have been rearranged to remove structure holes. (#709; #922) Bug fixes --------- * Several BGZF format fixes: - Support for multi-member gzip files. (#744, thanks to Adam Novak; #742) - Fixed error handling code for native gzip formatted files. (64c4927) - CRCs checked when threading too (previously only when non-threaded). (#745) - Made bgzf_useek function work with threads. (#818) - Fixed rare threading deadlocks. (#831) - Reading of very short files (<28 bytes) that do not contain an EOF block. (#910) * Fixed some thread pool deadlocks caused by race conditions. (#746, #906) * Many additional memory allocation checks in VCF, BCF, SAM and CRAM code. This also changes the return type of some functions. See ABI changes above. (#920 amongst others) * Replace some sam parsing abort() calls with proper errors. (#721, thanks to John Marshall; #576) * Fixed to permit SAM read names of length 252 to 254 (the maximum specified by the SAM specification). (#900, thanks to John Marshall) * Fixed mpileup overlap detection heuristic to work with BAMs having long CIGARs (more than 65536 operations). (#802) * Security fix: CIGAR strings starting with the "N" operation can no longer cause underflow on the bam CIGAR structure. Similarly CIGAR strings that are entirely "D" ops could leak the contents of uninitialised variables. (#699) * Fixed bug where alignments starting 0M could cause an invalid memory access in sam_prob_realn(). (#699) * Fixed out of bounds memory access in mpileup when given a reference with binary characters (top-bit set). (#808, thanks to John Marshall) * Fixed crash in mpileup overlap_push() function. (#882; #852 reported by Pierre Lindenbaum) * Fixed various potential CRAM memory leaks when recovering from error cases. * Fixed CRAM index queries for unmapped reads (#911; samtools/samtools#958 reported by @acorvelo) * Fixed the combination of CRAM embedded references and multiple slices per container. This was incorrectly setting the header MD5sum. (No impact on default CRAM behaviour.) (b2552fd) * Removed unwanted explicit data flushing in CRAM writing, which on some OSes caused major slowdowns. (#883) * Fixed inefficiencies in CRAM encoding when many small references occur within the middle of large chromosomes. Previously it switched into multi-ref mode, but not back out of it which caused the read POS field to be stored poorly. (#896) * Fixed CRAM handling of references when the order of sequences in a supplied fasta file differs to the order of the @SQ headers. (#935) * Fixed BAM and CRAM multi-threaded decoding when used in conjunction with the multi-region iterator. (#830; #577, #822, #926 all reported by Brent Pedersen) * Removed some unaligned memory accesses in CRAM encoder and undefined behaviour in BCF reading (#867, thanks to David Seifert) * Repeated calling of bcf_empty() no longer crashes. (#741) * Fixed bug where some 8 or 16-bit negative integers were stored using values reserved by the BCF specification. These numbers are now promoted to the next size up, so -121 to -128 are stored using at least 16 bits, and -32761 to -32768 are stored using 32 bits. Note that while BCF files affected by this bug are technically incorrect, it is still possible to read them. When converting to VCF format, HTSlib (and therefore bcftools) will interpret the values as intended and write out the correct negative numbers. (#766, thanks to John Marshall; samtools/bcftools#874) * Allow repeated invocations of bcf_update_info() and bcf_update_format_*() functions. (#856, thanks to John Marshall; #813 reported by Steffen Möller) * Memory leak removed in knetfile's kftp_parse_url() function. (#759, thanks to David Alexander) * Fixed various crashes found by libfuzzer (invalid data leading to errors), mostly but not exclusively in CRAM, VCF and BCF decoding. (#805) * Improved robustness of BAI and CSI index creation and loading. (#870; #967) * Prevent (invalid) creation of TBI indices for BCF files. (#837; samtools/bcftools#707) * Better parsing of handling of remote URLs with ?param=val components and their interaction with remote index URLs. (#790; #784 reported by Mark Ebbert) * hts_idx_load() now checks locally for all possible index names before attempting to download a remote index. It also checks that the remote file it downloads is actually an index before trying to save and use it. (#870; samtools/samtools#1045 reported by Albert Vilella) * hts_open_format() now honours the compression field, no longer also requiring an explicit "z" in the mode string. Also fixed a 1 byte buffer overrun. (#880) * Removed duplicate hts_tpool_process_flush prototype. (#816, reported by James S Blachly) * Deleted defunct cram_tell declaration. (66c41e2; #915 reported by Martin Morgan) * Fixed overly aggressive filename suffix checking in bgzip. (#927, thanks to John Marshall; #129, reported by @hguturu) * Tabix and bgzip --help output now goes to standard output. (#754, thanks to John Marshall) * Fixed bgzip index creation when using multiple threads. (#817) * Made bgzip -b option honour -I (index filename). (#817) * Bgzip -d no longer attempts to unlink(NULL) when decompressing stdin. (#718) Miscellaneous other changes --------------------------- * Integration with Google OSS fuzzing for automatic detection of more bugs. (Thanks to Google for their assistance and the bugs it has found.) (#796, thanks to Markus Kusano) * aclocal.m4 now has the pkg-config macros. (6ec3b94d; #733 reported by Thomas Hickman) * Improved C++ compatibility of some header files. (#772; #771 reported by @cwrussell) * Improved strict C99 compatibility. (#860, thanks to John Marshall) * Travis and AppVeyor improvements to aid testing. (#747; #773 thanks to Lennard Berger; #781; #809; #804; #860; #909) * Various minor compiler warnings fixed. (#708; #765; #846, #860, thanks to John Marshall; #865; #966; #973) * Various new and improved error messages. * Documentation updates (mostly in the header files). * Even more testing with "make check". * Corrected many copyright dates. (#979) * The default non-configure Makefile now uses libcurl instead of knet, so it can support https. (#895) ------------------------------------------------------------------------------ samtools - changes v1.10 ------------------------------------------------------------------------------ Changes affecting the whole of samtools, or multiple sub-commands: * Samtools now uses the new HTSlib header API. As this adds more checks for invalid headers, it is possible that some illegal files will now be rejected when they would have been allowed by earlier versions. (#998) Examples of problems that will now be rejected include @SQ lines with no SN: tag, and @RG or @PG lines with no ID: tag. * samtools sub-commands will now add '@PG' header lines to output sam/bam/cram files. To disable this, use the '--no-PG' option. (#1087; #1097) * samtools now supports alignment records with reference positions greater than 2 gigabases. This allows samtools to process alignments for species which have large chromosomes, like axolotl and lungfish. Note that due to file format limitations, data with large reference positions must use the SAM format. (#1107; #1117) * Improved the efficiency of reading and writing SAM format data by 2 fold (single thread). This is further improved by the ability to use multiple threads, as previously done with BAM and CRAM. * samtools can now write BGZF-compressed SAM format. To enable this, either save files with a '.sam.gz' suffix, or use '--output-fmt sam.gz'. * samtools can now index BGZF-compressed SAM files. * The region parsing code has been improved to handle colons in reference names. Strings can be disambiguated by the use of braces, so for example when reference sequences called "chr1" and "chr1:100-200" are both present, the regions "{chr1}:100-200" and "{chr1:100-200}" unambiguously indicate which reference is being used. (#864) * samtools flags, flagstats, idxstats and stats now have aliases flag, flagstat, idxstat and stat. (#934) * A new global '--write-index' option has been added. This allows output sam.gz/bam/cram files to be indexed while they are being written out. This should work with addreplacerg, depad, markdup, merge, sort, split, and view. (#1062) * A global '--verbosity' option has been added to enable/disable debugging output. (#1124, thanks to John Marshall) * It is now possible to have data and index files stored in different locations. There are two ways to tell samtools where to find the index: 1. Samtools bedcov, depth, merge, mpileup, stats, tview, and view accept a new option (-X). When this is used, each input sam/bam/cram listed on the command line should have a corresponding index file. Note that all the data files should be listed first, followed by all the index files. (#978, thanks to Mingfei Shao) 2. A delimiter '##idx##' can be appended to the data file name followed by the index file name. This can be used both for input files and outputs when indexing on-the-fly. * HTSlib (and therefore SAMtools) now uses version 4 signatures by default for its s3:// plug-in. It can also write to S3 buckets, as long as version 4 signatures are in use. See HTSlib's NEWS file and htslib-s3-plugin manual page for more information. * HTSlib (and therefore SAMtools) no longer considers a zero-length file to be a valid SAM file. This has been changed so that pipelines such as `somecmd | samtools ...` with `somecmd` aborting before outputting anything will now propagate the error to the second command. * The samtools manual page has been split up into one for each sub-command. The main samtools.1 manual page now lists the sub-commands and describes the common global options. (#894) * The meaning of decode_md, store_md and store_nm in the fmt-option section of the samtools.1 man page has been clarified. (#898, thanks to Evan Benn) * Fixed numerous memory leaks. (#892) * Fixed incorrect macro definition on Windows. (#950) * bedcov, phase, misc/ace2sam and misc/wgsim now check for failure to open files. (#1013, thanks to Julie Blommaert and John Marshall) Changes affecting specific sub-commands: * A new "coverage" sub-command has been added. This prints a tabular format of the average coverage and percent coverage for each reference sequence, as well as number of aligned reads, average mapping quality and base quality. It can also (with the '-m' option) plot a histogram of coverage across the genome. (#992, thanks to Florian Breitwieser) * samtools calmd: - Reference bases in MD: tags are now converted to upper case. (#981, #988) * samtools depth: - Add new options to write a header to the output (-H) and to direct the output to a file (-o). (#937, thanks to Pierre Lindenbaum) - New options '-g' and '-G' can be used to filter reads. (#953) - Fix memory leak when failing to set CRAM options. (#985, thanks to Florian Breitwieser) - Fix bug when using region filters where the '-a' option did not work for regions with no coverage. (#1113; #1112 reported by Paweł Sztromwasser) * samtools fasta and fastq: - '-1 FILE -2 FILE' with the same filename now works properly. (#1042) - '-o FILE' is added as a synonym for '-1 FILE -2 FILE'. (#1042) - The '-F' option now defaults to 0x900 (SECONDARY,SUPPLEMENTARY). Previously secondary and supplementary records were filtered internally in a way that could not be turned off. (#1042; #939 reported by @finswimmer) - Allow reading from a pipe without an explicit '-' on the command line. (#1042; #874 reported by John Marshall) - Turn on multi-threading for bgzf compressed output files. (#908) - Fixed bug where the samtools fastq -i would output incorrect information in the Casava tags for dual-index reads. It also now prints the tags for dual indices in the same way as bcl2fastq, using a '+' sign between the two parts of the index. (#1059; #1047 reported by Denis Loginov) * samtools flagstat: - Samtools flagstat can now optionally write its output in JSON format or as a tab-separated values file. (#1106, thanks to Vivek Rai). * samtools markdup: - It can optionally tag optical duplicates (reads following Illumina naming conventions only). The is enabled with the '-d' option, which sets the distance for duplicates to be considered as optical. (#1091; #1103; #1121; #1128; #1134) - The report stats (-s) option now outputs counts for optical and non-primary (supplementary / secondary) duplicates. It also reports the Picard "estimate library size" statistic. A new '-f' option can be used to save the statistics in a given file. (#1091) - The rules for calling duplicates can be changed using the new --mode option. This mainly changes the position associated with each read in a pair. '--mode t' (the default) is the existing behaviour where the position used is that of the outermost template base associated with the read. Alternatively '--mode s' always uses the first unclipped sequence base. In practice, this only makes a difference for read pairs where the two reads are aligned in the same direction. (#1091) - A new '-c' option can be used to clear any existing duplicate tags. (#1091) - A new '--include-fails' option makes markdup include QC-failed reads. (#1091) - Fixed buffer overflow in temporary file writer when writing a mixture of long and short alignment records. (#911; #909) * samtools mpileup: - mpileup can now process alignments including CIGAR P (pad) operators correctly. They will now also produce the correct output for alignments where insertions are immediately followed by deletions, or deletions by insertions. Note that due to limitations in HTSlib, they are still unable to output sequences that have been inserted before the first aligned base of a read. (#847; #842 reported by Tiffany Delhomme. See also htslib issue #59 and pull request #699). - In samtools mpileup, a deletion or pad on the reverse strand is now marked with a different character ('#') than the one used on a forward strand ('*'), if the '--reverse-del' option is used. (#1070) - New option '--output-extra' can be used to add columns for user selected alignment fields or aux tags. (#1073) - Fixed double-counting of overlapping bases in alignment records with deletions or reference skips longer than twice the insert size. (#989; #987 reported by @dariomel) - Improved manual page with documentation about what each output column means. (#1055, thanks to John Marshall) * samtools quickcheck: - Add unmapped (-u) option, which disables the check for @SQ lines in the header. (#920, thanks to Shane McCarthy) * samtools reheader: - A new option '-c' allows the input header to be passed to a given command. Samtools then takes the output of this command and uses it as the replacement header. (#1007) - Make it clear in help message that reheader --in-place only works on CRAM files. (#921, thanks to Julian Gehring) - Refuse to in-place reheader BAM files, instead of unexpectedly writing a BAM file to stdout. (#935) * samtools split: - In samtools split, the '-u' option no longer accepts an extra file name from which a replacement header was read. The two file names were separated using a colon, which caused problems on Windows and prevented the use of URLs. A new '-h' option has been added to allow the replacement header file to be specified in its own option. (#961) - Fixed bug where samtools split would crash if it read a SAM header that contained an @RG line with no ID tag. (#954, reported by @blue-bird1) * samtools stats: - stats will now compute base compositions for BC, CR, OX and RX tags, and quality histograms for QT, CY, BZ and QX tags. (#904) - New stats FTC and LTC showing total number of nucleotides for first and last fragments. (#946) - The rules for classifying reads as "first" or "last" fragment have been tightened up. (#949) - Fixed bug where stats could over-estimate coverage when using the target-regions option or when a region was specified on the command-line. (#1027; #1025, reported by Miguel Machado; #1029, reported by Jody Phelan). - Fixed error in stats GCD percentile depth calculation when the depth to be reported fell between two bins. It would report the depth entirely from the lower bin instead of taking a weighted average of the two. (#1048) - Better catching and reporting of out of memory conditions. (#984; #982, reported by Jukka Matilainen) - Improved manual page. (#927) * samtools tview: - tview can now display alignments including CIGAR P operators, D followed by I and I followed by D correctly. See mpileup above for more information. (#847; #734, reported by Ryan Lorig-Roach) - The "go to position" text entry box has been made wider. (#968, thanks to John Marshall) - Fixed samtools tview -s option which was not filtering reads correctly. It now only shows reads from the requested sample or read group. (#1089) * samtools view: - New options '-d' and '-D' to only output alignments which have a tag with a given type and value. (#1001, thanks to Gert Hulselmans) * misc/plot-bamstats script: - Fixed merge (-m) option. (#923, #924 both thanks to Marcus D Sherman) - Made the quality heatmap work with gnuplot version 5.2.7 and later. (#1068; #1065 reported by Martin Mokrejš) - Fixed --do-ref-stats bug where fasta header lines would be counted as part of the sequence when the --targets option was used. (#1120, thanks to Neil Goodgame) * Removed the misc/varfilter.py Python script, as it takes consensus-pileup as input, which was removed from samtools in release 0.1.17 in 2011. (#1125) ------------------------------------------------------------------------------ bcftools - changes v1.10 ------------------------------------------------------------------------------ * Numerous bug fixes, usability improvements and sanity checks were added to prevent common user errors. * The -r, --regions (and -R, --regions-file) option should never create unsorted VCFs or duplicates records again. This also fixes rare cases where a spanning deletion makes a subsequent record invisible to `bcftools isec` and other commands. * Additions to filtering and formatting expressions - support for the spanning deletion alternate allele (ALT=*) - new ILEN filtering expression to be able to filter by indel length - new MEAN, MEDIAN, MODE, STDEV, phred filtering functions - new formatting expression %PBINOM (phred-scaled binomial probability), %INFO (the whole INFO column), %FORMAT (the whole FORMAT column), %END (end position of the REF allele), %END0 (0-based end position of the REF allele), %MASK (with multiple files indicates the presence of the site in other files) * New plugins - `+gvcfz`: compress gVCF file by resizing gVCF blocks according to specified criteria - `+indel-stats`: collect various indel-specific statistics - `+parental-origin`: determine parental origin of a CNV region - `+remove-overlaps`: remove overlapping variants. - `+split-vep`: query structured annotations such INFO/CSQ created by bcftools/csq or VEP - `+trio-dnm`: screen variants for possible de-novo mutations in trios * `annotate` - new -l, --merge-logic option for combining multiple overlapping regions * `call` - new `bcftools call -G, --group-samples` option which allows grouping samples into populations and applying the HWE assumption within but not across the groups. * `csq` - significant reduction of memory usage in the local -l mode for VCFs with thousands of samples and 20% reduction in the non-local haplotype-aware mode. - fixes a small memory leak and formatting issue in FORMAT/BCSQ at sites with many consequences - do not print protein sequence of start_lost events - support for "start_retained" consequence - support for symbolic insertions (ALT="<INS...>"), "feature_elongation" consequence - new -b, --brief-predictions option to output abbreviated protein predictions. * `concat` - the `--naive` command now checks header compatibility when concatenating multiple files. * `consensus` - add a new `-H, --haplotype 1pIu/2pIu` feature to output first/second allele for phased genotypes and the IUPAC code for unphased genotypes - new -p, --prefix option to add a prefix to sequence names on output * `+contrast` - added support for Fisher's test probability and other annotations * `+fill-from-fasta` - new -N, --replace-non-ACGTN option * `+dosage` - fix some serious bugs in dosage calculation * `+fill-tags` - extended to perform simple on-the-fly calculations such as calculating INFO/DP from FORMAT/DP. * `merge` - add support for merging FORMAT strings - bug fixed in gVCF merging * `mpileup` - a new optional SCR annotation for the number of soft-clipped reads * `reheader` - new -f, --fai option for updating contig lines in the VCF header * `+trio-stats` - extend output to include DNM homs and recurrent DNMs * VariantKey support -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Yossi F. <fa...@br...> - 2018-12-20 20:38:21
|
Thanks, I tested this since the sam-spec references it...and I wanted to check that it works! On Thu, Dec 20, 2018 at 10:51 AM Peter Cock <p.j...@go...> wrote: > Well the emaIl reached at least one subscriber, but I am only > an occasional minor contributor - not one of the core team. > > You may find the GitHub repositories more lively? > > https://github.com/samtools/ > > Peter > > On Thu, Dec 20, 2018 at 3:41 PM Yossi Farjoun > <fa...@br...> wrote: > > > > This is a test. > > _______________________________________________ > > Samtools-devel mailing list > > Sam...@li... > > https://lists.sourceforge.net/lists/listinfo/samtools-devel > |
From: Peter C. <p.j...@go...> - 2018-12-20 16:21:05
|
Well the emaIl reached at least one subscriber, but I am only an occasional minor contributor - not one of the core team. You may find the GitHub repositories more lively? https://github.com/samtools/ Peter On Thu, Dec 20, 2018 at 3:41 PM Yossi Farjoun <fa...@br...> wrote: > > This is a test. > _______________________________________________ > Samtools-devel mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-devel |
From: Yossi F. <fa...@br...> - 2018-12-20 15:39:20
|
This is a test. |
From: Robert D. <rm...@sa...> - 2018-07-18 16:08:19
|
Samtools (and HTSlib and BCFtools) version 1.9 is now available from GitHub and SourceForge https://sourceforge.net/projects/samtools/ https://github.com/samtools/htslib/releases/tag/1.9 https://github.com/samtools/samtools/releases/tag/1.9 https://github.com/samtools/bcftools/releases/tag/1.9 The main changes are listed below: ------------------------------------------------------------------------------ htslib - changes v1.9 ------------------------------------------------------------------------------ * If `./configure` fails, `make` will stop working until either configure is re-run successfully, or `make distclean` is used. This makes configuration failures more obvious. (#711, thanks to John Marshall) * The default SAM version has been changed to 1.6. This is in line with the latest version specification and indicates that HTSlib supports the CG tag used to store long CIGAR data in BAM format. * bgzip integrity check option '--test' (#682, thanks to @sd4B75bJ, @jrayner) * Faidx can now index fastq files as well as fasta. The fastq index adds an extra column to the `.fai` index which gives the offset to the quality values. New interfaces have been added to `htslib/faidx.h` to read the fastq index and retrieve the quality values. It is possible to open a fastq index as if fasta (only sequences will be returned), but not the other way round. (#701) * New API interfaces to add or update integer, float and array aux tags. (#694) * Add `level=<number>` option to `hts_set_opt()` to allow the compression level to be set. Setting `level=0` enables uncompressed output. (#715) * Improved bgzip error reporting. * Better error reporting when CRAM reference files can't be opened. (#706) * Fixes to make tests work properly on Windows/MinGW - mainly to handle line ending differences. (#716) * Efficiency improvements: - Small speed-up for CRAM indexing. - Reduce the number of unnecessary wake-ups in the thread pool. (#703) - Avoid some memory copies when writing data, notably for uncompressed BGZF output. (#703) * Bug fixes: - Fix multi-region iterator bugs on CRAM files. (#684) - Fixed multi-region iterator bug that caused some reads to be skipped incorrectly when reading BAM files. (#687) - Fixed synced_bcf_reader() bug when reading contigs multiple times. (#691, reported by @freeseek) - Fixed bug where bcf_hdr_set_samples() did not update the sample dictionary when removing samples. (#692, reported by @freeseek) - Fixed bug where the VCF record ref length was calculated incorrectly if an INFO END tag was present. (71b00a) - Fixed warnings found when compiling with gcc 8.1.0. (#700) - sam_hdr_read() and sam_hdr_write() will now return an error code if passed a NULL file pointer, instead of crashing. - Fixed possible negative array look-up in sam_parse1() that somehow escaped previous fuzz testing. (#731, reported by @fCorleone) - Fixed bug where cram range queries could incorrectly report an error when using multiple threads. (#734, reported by Brent Pedersen) - Fixed very rare rANS normalisation bug that could cause an assertion failure when writing CRAM files. (#739, reported by @carsonhh) ------------------------------------------------------------------------------ samtools - changes v1.9 ------------------------------------------------------------------------------ * Samtools mpileup VCF and BCF output is now deprecated. It is still functional, but will warn. Please use bcftools mpileup instead. (#884) * Samtools mpileup now handles the '-d' max_depth option differently. There is no longer an enforced minimum, and '-d 0' is interpreted as limitless (no maximum - warning this may be slow). The default per-file depth is now 8000, which matches the value mpileup used to use when processing a single sample. To get the previous default behaviour use the higher of 8000 divided by the number of samples across all input files, or 250. (#859) * Samtools stats new features: - The '--remove-overlaps' option discounts overlapping portions of templates when computing coverage and mapped base counting. (#855) - When a target file is in use, the number of bases inside the target is printed and the percentage of target bases with coverage above a given threshold specified by the '--cov-threshold' option. (#855) - Split base composition and length statistics by first and last reads. (#814, #816) * Samtools faidx new features: - Now takes long options. (#509, thanks to Pierre Lindenbaum) - Now warns about zero-length and truncated sequences due to the requested range being beyond the end of the sequence. (#834) - Gets a new option (--continue) that allows it to carry on when a requested sequence was not in the index. (#834) - It is now possible to supply the list of regions to output in a text file using the new '--region-file' option. (#840) - New '-i' option to make faidx return the reverse complement of the regions requested. (#878) - faidx now works on FASTQ (returning FASTA) and added a new fqidx command to index and return FASTQ. (#852) * Samtools collate now has a fast option '-f' that only operates on primary pairs, dropping secondary and supplementary. It tries to write pairs to the final output file as soon as both reads have been found. (#818) * Samtools bedcov gets a new '-j' option to make it ignore deletions (D) and reference skips (N) when computing coverage. (#843) * Small speed up to samtools coordinate sort, by converting it to use radix sort. (#835, thanks to Zhuravleva Aleksandra) * Samtools idxstats now works on SAM and CRAM files, however this isn't fast due to some information lacking from indices. (#832) * Compression levels may now be specified with the level=N output-fmt-option. E.g. with -O bam,level=3. * Various documentation improvements. * Bug-fixes: - Improved error reporting in several places. (#827, #834, #877, cd7197) - Various test improvements. - Fixed failures in the multi-region iterator (view -M) when regions provided via BED files include overlaps (#819, reported by Dave Larson). - Samtools stats now counts '=' and 'X' CIGAR operators when counting mapped bases. (#855) - Samtools stats has fixes for insert size filtering (-m, -i). (#845; #697 reported by Soumitra Pal) - Samtools stats -F now longer negates an earlier -d option. (#830) - Fix samtools stats crash when using a target region. (#875, reported by John Marshall) - Samtools sort now keeps to a single thread when the -@ option is absent. Previously it would spawn a writer thread, which could cause the CPU usage to go slightly over 100%. (#833, reported by Matthias Bernt) - Fixed samtools phase '-A' option which was incorrectly defined to take a parameter. (#850; #846 reported by Dianne Velasco) - Fixed compilation problems when using C_INCLUDE_PATH. (#870; #817 reported by Robert Boissy) - Fixed --version when built from a Git repository. (#844, thanks to John Marshall) - Use noenhanced mode for title in plot-bamstats. Prevents unwanted interpretation of characters like underscore in gnuplot version 5. (#829, thanks to M. Zapukhlyak) - blast2sam.pl now reports perfect match hits (no indels or mismatches). (#873, thanks to Nils Homer) - Fixed bug in fasta and fastq subcommands where stdout would not be flushed correctly if the -0 option was used. - Fixed invalid memory access in mpileup and depth on alignment records where the sequence is absent. ------------------------------------------------------------------------------ bcftools - changes v1.9 ------------------------------------------------------------------------------ * `annotate` - REF and ALT columns can be now transferred from the annotation file. - fixed bug when setting vector_end values. * `consensus` - new -M option to control output at missing genotypes - variants immediately following insersions should not be skipped. Note however, that the current fix requires normalized VCF and may still falsely skip variants adjacent to multiallelic indels. - bug fixed in -H selection handling * `convert` - the --tsv2vcf option now makes the missing genotypes diploid, "./." instead of "." - the behavior of -i/-e with --gvcf2vcf changed. Previously only sites with FILTER set to "PASS" or "." were expanded and the -i/-e options dropped sites completely. The new behavior is to let the -i/-e options control which records will be expanded. In order to drop records completely, one can stream through "bcftools view" first. * `csq` - since the real consequence of start/splice events are not known, the aminoacid positions at subsequent variants should stay unchanged - add `--force` option to skip malformatted transcripts in GFFs with out-of-phase CDS exons. * `+dosage`: output all alleles and all their dosages at multiallelic sites * `+fixref`: fix serious bug in -m top conversion * `-i/-e` filtering expressions: - add two-tailed binomial test - add functions N_PASS() and F_PASS() - add support for lists of samples in filtering expressions, with many samples it was impractical to list them all on the command line. Samples can be now in a file as, e.g., GT[@samples.txt]="het" - allow multiple perl functions in the expressions and some bug fixes - fix a parsing problem, '@' was not removed from '@filename' expressions * `mpileup`: fixed bug where, if samples were renamed using the `-G` (`--read-groups`) option, some samples could be omitted from the output file. * `norm`: update INFO/END when normalizing indels * `+split`: new -S option to subset samples and to use custom file names instead of the defaults * `+smpl-stats`: new plugin * `+trio-stats`: new plugin * Fixed build problems with non-functional configure script produced on some platforms Rob Davies rm...@sa... The Sanger Institute http://www.sanger.ac.uk/ Hinxton, Cambs., Tel. +44 (1223) 834244 CB10 1SA, U.K. Fax. +44 (1223) 494919 -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Marcus D. S. <md...@um...> - 2018-06-15 19:50:00
|
I didn’t know where to really introduce this, so I figure devel was best. SAMtools and pysam are terrific. Hands down, they are what I cut my teeth on. However, we know that doing the things that SAMtools and pysam do in a Windows environment is complicated (if not impossible) for the uninitiated. In making a platform agnostic desktop app, I came right up against this problem. To overcome this issue, I stuck to the current specs and made a BAM (and only BAM) file parser written in pure Python from the ground up. The python package, called BAMnostic can perform both serial processing and random access on BAM files. It also preserves much of the pysam API namespaces for easy transition when working in an environment that requires the use of BAMnostic. Special care was taken to make sure that BAMnostic will work in any version of Python from version 2.7 onward. Additionally, it has no dependencies, outside of the Python standard library, which makes it fully compatible with all stable versions of PyPy. For those interested in playing around with it, or wholly using it in your pipelines, please do. I need all the feedback I can get right now. For those that know Python pretty well or have a spare moment, feel free to code review to your heart’s content. Nonetheless, I hope that BAMnostic can help someone out there. BAMnostic is available through any of the links associated with this email or at the following locations: GitHub: https://github.com/betteridiot/bamnostic Conda: conda install -c conda-forge bamnostic PyPI: pip install bamnostic Documentation can be found at https://bamnostic.readthedocs.io/en/latest/ Thank you for your time. Marcus D. Sherman DCMB PhD Candidate Mills Lab https://betteridiot.tech |
From: Robert D. <rm...@sa...> - 2018-05-10 14:27:14
|
On Thu, 10 May 2018, Alex Hodgkins wrote: > I would like to call bam_fillmd1_core from bam_md.c in my application – >however, I see that there’s no header file for it. Is there any plan to >expose the method (and its flags) publicly? It seems fairly standalone >due to the simple flag configuration There's no plan to expose it at present - especially as it's part of samtools which doesn't officially export anything in library form. If we were to make it available, it may be moved to htslib first. Of course it's MIT/Expat licensed code, so there's nothing to stop you from copying the bits of source you want into your project as long as you include the copyright notice as well. Rob Davies rm...@sa... The Sanger Institute http://www.sanger.ac.uk/ Hinxton, Cambs., Tel. +44 (1223) 834244 CB10 1SA, U.K. Fax. +44 (1223) 494919 -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Alex H. <ale...@co...> - 2018-05-10 11:03:42
|
I would like to call bam_fillmd1_core from bam_md.c in my application – however, I see that there’s no header file for it. Is there any plan to expose the method (and its flags) publicly? It seems fairly standalone due to the simple flag configuration Thanks, Alex The information in this e-mail and any attachments is confidential and is intended for the legitimate addressee only. If you receive this e-mail in error, please contact the sender forthwith and then delete the e-mail. We have taken all reasonable precautions to check this e-mail and any attachments for viruses, but we cannot accept any liability for any damage sustained as a result of any virus, worm or other malicious software. Congenica Ltd is registered in England and Wales No.08273616. |