From: George G. <gg...@br...> - 2014-10-08 15:29:36
|
Picard Release 1.122 8 October 2014 - New Command Line Program "GenotypeConcordance" -- Calculates the concordance between genotype data for two samples in two different VCFs - one being considered the truth (or reference) the other being considered the call. The concordance is broken into separate results sections for SNPs and indels. Summary and detailed statistics are reported. Note that for any pair of variants to compare, only the alleles for the samples under interrogation are considered and MNP, Symbolic, and Mixed classes of variants are not included. - New Command Line Program "UpdateVcfDictionary" -- Updates the sequence dictionary of a VCF from another file (SAM, BAM, VCF, dictionary, interval_list, fasta, etc). - New Command Line Program "VcfToIntervalList" -- Create an interval list from a VCF - New Command Line Program "MarkDuplicatesWithMateCigar" -- A new tool with which to mark duplicates: This tool can replace MarkDuplicates if the input SAM/BAM has Mate CIGAR (MC) optional tags pre-computed (see the tools RevertOriginalBaseQualitiesAndAddMateCigar and FixMateInformation). This allows the new tool to perform a streaming duplicate marking routine (i.e. a single-pass). This tool cannot be used with alignments that have large gaps or reference skips, which happens frequently in RNA-seq data. There were many refactors of the old MarkDuplicates and MarkDuplicatesWithMateCigar, since the share common code. EstimateLibraryComplexity was caught up in this too. Many, many, many unit tests were added to were added to prove equivalency of MarkDuplicatesWithMateCigar to MarkDuplicates. This also exposed a few one in a million corner cases in MarkDuplicates both in duplicate marking as well as optical duplicate detection. This results in MarkDuplicates needing to write slightly larger temporary files when running. SamFileTester was also improved to handle the various test cases for duplicate marking testing. - Updates to IntervalList: -- Added capacity to create a simple interval list from a string (the name of the contig) -- Added the capacity to subtract one interval list from another (currently it would only work if they were both wrapped inside a container) - Updates to SamLocusIterator -- Performance optimizations gaining about 35% speed up... - Updates to MarkDuplicates: -- Removed unnecessary storage of a string in the Read Ends in Mark -- Clarifed the size of ReadEndsForMarkDuplicates - Updated the minimum number of times that the BAIT_INTERVALS (in CalculateHsMetrics) and TARGET_INTERVALS (in CollectTargetedMetrics) must be set to one. - Moved CollectHiSeqPfFailMetrics into picard public - Updates to documentation generation (internal): -- changed link to IntervalList.java documentation -- updated how _includes/command-line-usage.html is generated - Moved SAMSequenceDictionaryExtractor and tests from picard to htsjdk - George |