Picard release 1.47
7 June 2011
- DownsampleSam.java: simple utility to randomly downsample a SAM or BAM file.
- IntervalListTools.java: tools to sort, merge, unique, pad and report on what's in an interval list.
- CollectRnaSeqMetrics.java: Improve error message when sequence dictionaries differ.
- We are experimenting with a library called Snappy, which we plan to use to compress temporary files created when merge-sorting. We have discovered a couple of problems, however, so this code is checked in but disabled currently.
- EstimateLibraryComplexity.java: Fixed an index out of bounds error in passesQualityCheck.
- MarkDuplicates.java: Add option SORTING_COLLECTION_SIZE_RATIO to deal with out-of-memory issues.
- CollectAlignmentSummaryMetrics.java: Added a PF_INDEL_RATE to collect alignment summary metrics that calculates total number of short insertions/deletions seen in reads / total aligned pf bases.
- Histogram.java: Added methods for calculating:
1) The geometric mean
2) The median absolute deviation
3) Estimating the SD of a quasi-normal distribution via the median absolute deviation
- CollectInsertSizeMetrics.java: Changed CollectInsertSizeMetrics to use a more sensible approach to trimming the distribution of insert sizes before calculating the mean and sd, and for sizing the plot. The new method is simply to trim the distribution to [0..(median + 10median_absolute_deviation)]. This works well when the distribution is mostly normal (approximating mean+6.7sd), and is much more robust to bimodality and other strange distributions.
- Histogram.java: More robust implementation of getMedian that does the right thing if there are an even number of things in the histogram.
- SequenceUtil.java: Added a utility method for calculating GC% for a byte of bases.
- CalculateHsMetrics.java: Added code for two new features: 1) the addition of HS AT and GC dropout metrics to measure GC bias in a way similar to whole genome data, but just for the target regions. 2) the addition of a PER_TARGET_COVERAGE option which allows the output of detailed per-target metrics for ad-hoc analysis.