From: Alec W. <al...@br...> - 2010-05-05 12:08:43
|
Picard Release 1.19 5 May 2010 - SAMFileHeader: Changes to only store a copy of the full SAM header as a String if it is < 2MB in memory or if there are errors. This should dramatically reduce the memory needed to merge files with large sequence dictionaries or other large headers. -Alec |
From: Alec W. <al...@br...> - 2010-05-17 14:34:55
|
Picard release 1.21 17 May 2010 Note that there is no release 1.20. That version number was skipped to avoid ambiguity with release 1.2. - ValidateSamFile: Don't complain about no sequence dictionary if no reads are aligned. - MergeSamFiles: Try and reduce the number of totally identical sequence dictionaries kept in memory during a merge operation. - MergeSamFiles: Implemented a simple threaded version of the algorithm that provides a roughly 20% performance increase in exchange for roughtly 20% more CPU consumption. - SAMUtils.java: Made a few quality/fastq conversion methods public. - Add SAMFileReader ctor that takes a SeekableStream, so BAM files can be read from arbitrary sources like FTP or Aspera. - Eliminate dependencies from net.sf.samtools on net.sf.picard that were inadvertently introduced. - Enforce requirement that net.sf.samtools be Java 1.5-compatible. In order for this check to work, Java 1.5 must be installed and JAVA_HOME_1_5 environment variable must point to this installation. - Check that SAMFileReader passed to SamLocusIterator is coordinate sorted. - EstimateLibraryComplexity: Implementation of an algorithm to estimate library complexity without alignment data, and refactoring of common code out of MarkDuplicates. - Java implementation of BuildBamIndex. -Alec |
From: Alec W. <al...@br...> - 2010-06-08 17:43:31
|
Picard Release 1.22 8 June 2010 - In MetricsFile, fix bug in which comparator for Histogram was being lost when merging histogram keysets in printHistogram. - Fixed an infinite recursion bug in SequenceUtil.java. - Moved CigarUtilTest to Picard-public, and refactor. - Added a flush() to SamFileValidator when running in VERBOSE mode, to ensure each error is written to the output immediatley. - In CoordMath.java, modified and added a few methods to make them take ints instead of longs since we use ints everywhere. -Alec |
From: Alec W. <al...@br...> - 2010-06-21 15:29:55
|
Picard Release 1.23 21 June 2010 - Published the following tools to the public Picard repository that were previously maintained by Broad developers internally: - BamToBfq - Create one or more sets or BFQ files for input to MAQ from a Sam or Bam file. - CollectAlignmentSummaryMetrics - Generate a set of summary alignment related metrics about a Sam or Bam file. - CollectGcBiasMetrics - Compute a set of "GC bias" metrics about a Sam or Bam file based on the relative distribution of aligned reads to windows in the genome with differing GC content. - CollectInsertSizeMetrics - Generate tabular data and a plot of insert sizes distributions for each pair orientation (FR, RF, FF+RR) - CalculateHsMetrics - Compute a set of metrics relevant for capture or other genome sub-setting methods - MeanQualityByCycle - Compute mean quality by instrument cycle, reported as text and plotted - QualityScoreDistribution - Produce a table of the quality score distribution in a Sam or Bam file and a plot of the same - NormalizeFasta - Take a FASTA file and "normalize" it so that all lines are of the same length (except the last line for each sequence) - FixMateInformation - Picard version of "samtools fixmate" that will also pre-sort the file into queryname order, apply mate pair information fixing and then optionally re-sort the file into the output order of the user's choice - MergeBamAlignment - Tool to take a Sam or Bam file of unmapped reads and merge it with a Sam or Bam file that contains alignment information for a subset of those reads, retaining all metadata from the unmapped file. - RevertSam - Tool to take a Sam or Bam file and "revert" it by removing various "processed" information such as calibrated qualities, duplicate marks, and alignment information Plus the following miscellaneous changes... - Caching implementation of BAMFileIndex, and API to browse BAM index independently of querying BAM file, in order to better divide work among processors. - MarkDuplicates -- Handle case in which all reads for a library are unmapped. Patch courtesy of Tom Mooney. - CigarUtil -- Fix so that if soft clipping results in a cigar with no aligned bases that the read be converted to an unaligned read. - BAMFileReader -- Fix NPE when doing an index query in which there are no overlapping reads. Change courtesy of Matt Hanna. - Add support for reading fasta with companion sequence dictionary and faidx, for random access and ability to read portion of a sequence. - Add more info to exception when MD string does not work with CIGAR. - BlockCompressedInputStream -- Improve exception message (slightly) for invalid uncompressed length. -Alec |
From: Alec W. <al...@br...> - 2010-06-21 19:57:38
|
Hi Folks, Please note that the BAM index-writing code in this release is beta-level. The API may change and it is not ready for general use. -Alec Alec Wysoker wrote: > Picard Release 1.23 > 21 June 2010 > > - Published the following tools to the public Picard repository that > were previously maintained by Broad developers internally: > > - BamToBfq - Create one or more sets or BFQ files for input to MAQ > from a Sam or Bam file. > > - CollectAlignmentSummaryMetrics - Generate a set of summary > alignment related metrics about a Sam or Bam file. > > - CollectGcBiasMetrics - Compute a set of "GC bias" metrics about a > Sam or Bam file based on the relative distribution of aligned reads to > windows in the genome with differing GC content. > > - CollectInsertSizeMetrics - Generate tabular data and a plot of > insert sizes distributions for each pair orientation (FR, RF, FF+RR) > > - CalculateHsMetrics - Compute a set of metrics relevant for capture > or other genome sub-setting methods > > - MeanQualityByCycle - Compute mean quality by instrument cycle, > reported as text and plotted > > - QualityScoreDistribution - Produce a table of the quality score > distribution in a Sam or Bam file and a plot of the same > > - NormalizeFasta - Take a FASTA file and "normalize" it so that all > lines are of the same length (except the last line for each sequence) > > - FixMateInformation - Picard version of "samtools fixmate" that will > also pre-sort the file into queryname order, apply mate pair information > fixing and then optionally re-sort the file into the output order of the > user's choice > > - MergeBamAlignment - Tool to take a Sam or Bam file of unmapped > reads and merge it with a Sam or Bam file that contains alignment > information for a subset of those reads, retaining all metadata from the > unmapped file. > > - RevertSam - Tool to take a Sam or Bam file and "revert" it by > removing various "processed" information such as calibrated qualities, > duplicate marks, and alignment information > > > Plus the following miscellaneous changes... > > - Caching implementation of BAMFileIndex, and API to browse BAM index > independently of querying BAM file, in order to better divide work among > processors. > > - MarkDuplicates -- Handle case in which all reads for a library are > unmapped. Patch courtesy of Tom Mooney. > > - CigarUtil -- Fix so that if soft clipping results in a cigar with no > aligned bases that the read be converted to an unaligned read. > > - BAMFileReader -- Fix NPE when doing an index query in which there are > no overlapping reads. Change courtesy of Matt Hanna. > > - Add support for reading fasta with companion sequence dictionary and > faidx, for random access and ability to read portion of a sequence. > > - Add more info to exception when MD string does not work with CIGAR. > > - BlockCompressedInputStream -- Improve exception message (slightly) for > invalid uncompressed length. > > > -Alec > > ------------------------------------------------------------------------------ > ThinkGeek and WIRED's GeekDad team up for the Ultimate > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > lucky parental unit. See the prize list and enter to win: > http://p.sf.net/sfu/thinkgeek-promo > _______________________________________________ > Samtools-devel mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-devel > |
From: Alec W. <al...@br...> - 2010-06-22 19:32:12
|
Picard Release 1.24 22 June 2010 Releasing ahead of schedule to fix new executables that were migrated to Picard. - Put R scripts used to generate graphs into executable jars -Alec |
From: Alec W. <al...@br...> - 2010-07-06 13:58:49
|
Picard Release 1.25 6 July 2010 - Speed up SAMFileHeader.hashCode - In SamFileHeaderMerger, change HashMap to IdentityHashMap to improve performance. - Updated FixMateInformation so that it can handle multiple input files and merge them dynamically and fix mate pair information at the same time. - When closing BlockCompressedOutputStream, do not verify file terminator if file is not a regular file. - Removed version option from command line program classes as it is not currently maintained and so not meaningful. - Added method for creating file system safe names to IoUtil. - Added getComparatorInstance() method to SAMFileHeader.SortOrder - Ignore SyncFailedException when closing BGZF so that it can write to unusual devices. - Handled skipped region (CIGAR operator N) in SequenceUtil.makeReferenceFromAlignment. - Generate metrics documentation and push up to Sourceforge. - Reduce memory footprint of IndexedFastaSequenceFile. - Remove requirement of IndexedFastaSequenceFile that dict file is present. - Fix error in ClipPairFixer when merging only aligned reads - Only use CloserUtil on files opened for read. -Alec |
From: Alec W. <al...@br...> - 2010-07-21 19:24:10
|
Release notes for Picard 1.26 21 July 2010 - Bam index building is still pre-release. - CollectInsertSizeMetrics Added a check that the read being examine must be mapped. - IoUtil: Added utility method to get the full canonical path for a file, resolving all symbolic links. - BlockCompressedInputStream, BAMFileReader, SAMFileReader, SAMTextReader, BlockGunzipper: Change to allow users to decide whether or not to compute and validate CRCs when decompressing BAMs and other block compressed GZIP data. Default is to not validate CRCs since it's an expensive operation. - Modified ValidateSamFile to turn on CRC checking that is off by default since this is probably once place that it should be done. - Modified MergeBamAlignment to proceed on the assumption that the records being merged are query-name sorted and to only sort if that assumption fails. - RevertSam: Added ability to overwrite SAMPLE_ALIAS and LIBRARY_NAME when reverting a BAM file. - SAMRecord: Throw more informative exception if SAMRecord.setReferenceIndex or SAMRecord.setMateReferenceIndex is passed an invalid index. - AbstractAlignmentMerger: Throw explanatory exception when no sequence dictionary available for reference when merging alignments. -Alec |
From: Alec W. <al...@br...> - 2010-08-02 12:35:24
|
Picard release 1.27 2 August 2010 - In BlockCompressedOutputStream, when compressed block is too big, downshift to NO_COMPRESSION mode instead of pushing some input into the next block. Reduce max uncompressed bytes per GZIP block so that in NO_COMPRESSION mode these will always fit in the GZIP block. - Modified SamFileHeaderMerger to work with SAMFileHeaders (as its name implies) instead of SAMFileReaders. All constructors and methods that involved the use of readers have been deprecated and replaced with ones using SAMFileHeaders. This also involved changes to MergingSamRecordIterator, whose old constructor has been deprecated and replaced with one that requires that the readers be handed in separately. - Added the option to add comment(s) to the header when merging SAM files. - Added a toString() to SamLocusIterator.LocusInfo - Minor improvements to help text. - BAM indexing code still under construction. -Alec |
From: Alec W. <al...@br...> - 2010-08-18 14:17:12
|
Picard Release 1.28 18 August 2010 - In various programs, write messages to stderr rather than stdout, so that BAM output can be sent to stdout. - BlockCompressedFilePointerUtil: encapsulate code for manipulating BGZF virtual file pointers. Make sure it works for virtual file pointer that is so large as to be negative as signed long. - BamToBfq: Added option to write only the first N bases of the read to the bfq file. - CigarUtil: Added method to add soft-clipped bases to either end of a cigar (used when reads have been trimmed before alignment and the cigar needs to be adjusted to reflect this). - AlignmentMerger: Added option to specify reserved tags from alignment bam to bring over when merging. - SamRecord: Change mAttributes from ArrayList to custom linked list to reduce memory footprint. - SamFileValidator: Added some progress logging. - DuplicationMetrics: Change to allow calculation of estimated library size in heavily duplicated (as in 99%+ duplicaiton) data. - MergeBamAlignment: Clarified documentation to make clear that a sequence dictionary is expected in the same directory as the reference fasta and that its extension should be .dict. -Alec |
From: Alec W. <al...@br...> - 2010-08-18 15:35:57
|
Hi Folks, We just discovered a bug in BAM queries. Please do not use this version. Fix will arrive in a few minutes. -Alec On 8/18/10 10:16 AM, Alec Wysoker wrote: > Picard Release 1.28 > 18 August 2010 > > - In various programs, write messages to stderr rather than stdout, so that BAM output can be sent to stdout. > > - BlockCompressedFilePointerUtil: encapsulate code for manipulating BGZF virtual file pointers. Make sure it works for virtual file pointer that is so large as to be negative as signed long. > > - BamToBfq: Added option to write only the first N bases of the read to the bfq file. > > - CigarUtil: Added method to add soft-clipped bases to either end of a cigar (used when reads have been trimmed before alignment and the cigar needs to be adjusted to reflect this). > > - AlignmentMerger: Added option to specify reserved tags from alignment bam to bring over when merging. > > - SamRecord: Change mAttributes from ArrayList to custom linked list to reduce memory footprint. > > - SamFileValidator: Added some progress logging. > > - DuplicationMetrics: Change to allow calculation of estimated library size in heavily duplicated (as in 99%+ duplicaiton) data. > > - MergeBamAlignment: Clarified documentation to make clear that a sequence dictionary is expected in the same directory as the reference fasta and that its extension should be .dict. > > > > -Alec > |
From: Alec W. <al...@br...> - 2010-08-18 15:41:36
|
Hi Folks, OK, this has been fixed. Sorry for any inconvenience. Picard release 1.29 18 August 2010 - Fixed incorrect use of BlockCompressedFilePointerUtil.areInSameOfAdjacentBlocks that would result in incorrect results for some calls to BAMFileReader.query methods. -Alec On 8/18/10 11:35 AM, Alec Wysoker wrote: > Hi Folks, > > We just discovered a bug in BAM queries. Please do not use this > version. Fix will arrive in a few minutes. > > -Alec > > On 8/18/10 10:16 AM, Alec Wysoker wrote: >> Picard Release 1.28 >> 18 August 2010 >> >> - In various programs, write messages to stderr rather than stdout, so that BAM output can be sent to stdout. >> >> - BlockCompressedFilePointerUtil: encapsulate code for manipulating BGZF virtual file pointers. Make sure it works for virtual file pointer that is so large as to be negative as signed long. >> >> - BamToBfq: Added option to write only the first N bases of the read to the bfq file. >> >> - CigarUtil: Added method to add soft-clipped bases to either end of a cigar (used when reads have been trimmed before alignment and the cigar needs to be adjusted to reflect this). >> >> - AlignmentMerger: Added option to specify reserved tags from alignment bam to bring over when merging. >> >> - SamRecord: Change mAttributes from ArrayList to custom linked list to reduce memory footprint. >> >> - SamFileValidator: Added some progress logging. >> >> - DuplicationMetrics: Change to allow calculation of estimated library size in heavily duplicated (as in 99%+ duplicaiton) data. >> >> - MergeBamAlignment: Clarified documentation to make clear that a sequence dictionary is expected in the same directory as the reference fasta and that its extension should be .dict. >> >> >> >> -Alec >> |
From: Alec W. <al...@br...> - 2010-09-20 16:58:28
|
Picard release 1.31 20 Sep 2010 - BAM index generation: This release contains code to generate BAM indices, both for existing BAM files, and also to generate a BAM index automatically in conjunction with the writing of a BAM file. This is beta quality code. We encourage people to try it out and report any problems, but we do not consider it to be production quality at this point. - FixMateInformation.java: Refactored to permit easier extension by sub-classes - CollectInsertSizeMetrics.java: Fixed to no longer throw an exception if there is insufficient data to plot. Also added an optimization to stop when it reaches the unmapped reads at the end of the file. - When writing BAM and configured to automatically generate bai or md5, do not try to write .bai or .bam.md5 if file is not a normal file. - SAMSequenceRecord.java: Treat sequence length of 0 as unknown length, and do not return false from isSameSequence because one of the sequences has this length and the other has something different. - HsMetricsCalculator.java: Minor edit to how bait set name is derived from the bait intervals file. - CollectAlignmentSummaryMetrics.java: Allow adding/overriding of ADAPTER_SEQUENCE option. - CommandLineParser.java: Command line written to stderr did not show the correct values for lists, despite internally the value being correct. -Alec |
From: Alec W. <al...@br...> - 2010-10-04 15:22:10
|
Picard release 1.32 4 October 2010 - Implementation of BAM index generation. This is available as a stand-alone program, BuildBamIndex, and Java class, BamIndexer, for indexing existing BAM files. A BAM index can also be generated on the fly as a BAM is being written, via the CREATE_INDEX command-line option, and through the Java API via new methods on SAMFileWriterFactory. Note that we have received one bug report about an invalid index being created. We have not been able to reproduce this problem and continue to investigate. Our testing has found the code to be working well. - Allow P CIGAR operator btw pair of Ds. - In ValidateSAM, allow either 1.0 or 1.3 to be considered valid SAM version in @HD record. - Make SeekableBufferedStream buffer size configurable. -Alec |
From: Alec W. <al...@br...> - 2010-10-19 11:46:39
|
Picard release 1.33 19 October 2010 - Validate BAM index, if appropriate, in ValidateSamFile. - In SamAlignmentMerger.java, added a check that the program record id handed in or present in the aligned BAM file is not already in use in the unmapped BAM. - In SamAlignmentMerger.java, fix issue with program group being duplicated. - In SamAlignmentMerger.java, added checks for duplicate read group and program group ids when call adding them in SAMFileHeader, modified SAMFileValidator to check for duplicates of these. - Throw FileNotFoundException in IndexedFastaSequenceFile ctor if index is not found. -Alec |
From: Alec W. <al...@br...> - 2011-03-14 14:19:46
|
Picard release 1.41 14 March 2011 - SamToFastq was NPEing if @RG.PU did not exist. I have modified it to fall back to using @RG.ID if @RG.PU is null and we are doing output per read group. - Add SeekableHTTPStream ctor that takes a Proxy object. - QualityScoreDistribution.java: Converted to use arrays internally to reduce runtime by 30-40%. - Adding functionality to SamToFastq to optionally trim read 1 or read 2 and to optionally write a maximum number of bases (post-trimming) to the fastq file. - Added a new utility called ReorderSam (contributed by Mark DePristo) which is designed to take a SAM/BAM file and switch out the sequence dictionary for one in a different order, and then re-order the INPUT file to produce a coordinate sorted OUTPUT file with respect to the new sequence dictionary. - AddOrReplaceReadGroups: A new utility to add or replace the (possibly empty) set of read groups in an existing SAM or BAM file, contributed by Mark DePristo. - Modified AbstractAlignmentMerger and SamAlignment merger to handle (a) multiple files of alignments and (b) separate alignments of both ends of a paired read. - Significant refactoring of MergeBamAlignment. AbstractAlignmentMerger and SamAlignmentMerger have been modified to use the expected orientations of read pairs rather than a boolean for whether the library is a jumping library. They have new constructors and code that uses these directly will have to be modified to use them. MergeBamAlignment has retained JUMP_SIZE for backwards compatiblity, but it has been deprecated and uses should switch to providing EXPECTED_ORIENTATIONS. - MeanQualityByCycle.java: Reverse complement all reads with the negativeStrandFlag set, even if they are unmapped - Make sure BAM index is created when indexing on the fly, even when the output BAM does not contain any reads. - ValidateSamFile: Flush output stream so that message that error list has been truncated is emitted. - RevertSam.java: When a read has negative strand flag set, reverse-complement it regardless of whether it is mapped or not, and clear the flag. - CollectInsertSizeMetrics.java: Changed histogram holder from HashMap to EnumMap so that metrics file will output in a predictable order - NormalizeFasta.java: Added a warning message if a sequence doesn't have any bases while normalizing. -Alec |
From: Tadigotla, V. <Vas...@li...> - 2010-06-21 21:38:55
|
Hi, I'm trying to run CollectGcBiasMetrics and I'm getting the error below with picard release 1.23. Are there any other files that I need apart from the zip archive on sourceforge to get this to work. Thanks, Vasisht [Mon Jun 21 16:56:43 EDT 2010] net.sf.picard.analysis.CollectGcBiasMetrics done. Runtime.totalMemory()=1530593280 Exception in thread "main" java.lang.IllegalArgumentException: Script [edu/mit/broad/picard/sam/gcBias.R] not found in classpath at net.sf.picard.util.RExecutor.writeScriptFile(RExecutor.java:85) at net.sf.picard.util.RExecutor.executeFromClasspath(RExecutor.java:54) at net.sf.picard.analysis.CollectGcBiasMetrics.doWork(CollectGcBiasMetrics.java:212) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150) at net.sf.picard.analysis.CollectGcBiasMetrics.main(CollectGcBiasMetrics.java:91) |
From: Kathleen T. <kti...@br...> - 2010-06-22 13:49:14
|
Sorry! The R scripts were inadvertently left behind when moving these tools to the public Picard repository. I'm working on moving and testing them now, and will send a note to the list when that is done. Kathleen Tibbetts On Jun 21, 2010, at 5:33 PM, Tadigotla, Vasisht wrote: > Hi, > > I'm trying to run CollectGcBiasMetrics and I'm getting the error > below with picard release 1.23. Are there any other files that I > need apart from the zip archive on sourceforge to get this to work. > > Thanks, > Vasisht > > [Mon Jun 21 16:56:43 EDT 2010] > net.sf.picard.analysis.CollectGcBiasMetrics done. > Runtime.totalMemory()=1530593280 > Exception in thread "main" java.lang.IllegalArgumentException: > Script [edu/mit/broad/picard/sam/gcBias.R] not found in classpath > at > net.sf.picard.util.RExecutor.writeScriptFile(RExecutor.java:85) > at > net.sf.picard.util.RExecutor.executeFromClasspath(RExecutor.java:54) > at > net > .sf > .picard > .analysis.CollectGcBiasMetrics.doWork(CollectGcBiasMetrics.java:212) > at > net > .sf > .picard > .cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150) > at > net > .sf > .picard.analysis.CollectGcBiasMetrics.main(CollectGcBiasMetrics.java: > 91) > > > ------------------------------------------------------------------------------ > ThinkGeek and WIRED's GeekDad team up for the Ultimate > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > lucky parental unit. See the prize list and enter to win: > http://p.sf.net/sfu/thinkgeek-promo > _______________________________________________ > Samtools-help mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-help |
From: Kathleen T. <kti...@br...> - 2010-06-22 16:39:08
|
The R scripts have been updated in the public Picard repository. Please let me know if there are any additional issues. Kathleen On Jun 22, 2010, at 9:49 AM, Kathleen Tibbetts wrote: > Sorry! The R scripts were inadvertently left behind when moving these > tools to the public Picard repository. I'm working on moving and > testing them now, and will send a note to the list when that is done. > > Kathleen Tibbetts > > On Jun 21, 2010, at 5:33 PM, Tadigotla, Vasisht wrote: > >> Hi, >> >> I'm trying to run CollectGcBiasMetrics and I'm getting the error >> below with picard release 1.23. Are there any other files that I >> need apart from the zip archive on sourceforge to get this to work. >> >> Thanks, >> Vasisht >> >> [Mon Jun 21 16:56:43 EDT 2010] >> net.sf.picard.analysis.CollectGcBiasMetrics done. >> Runtime.totalMemory()=1530593280 >> Exception in thread "main" java.lang.IllegalArgumentException: >> Script [edu/mit/broad/picard/sam/gcBias.R] not found in classpath >> at >> net.sf.picard.util.RExecutor.writeScriptFile(RExecutor.java:85) >> at >> net.sf.picard.util.RExecutor.executeFromClasspath(RExecutor.java:54) >> at >> net >> .sf >> .picard >> .analysis.CollectGcBiasMetrics.doWork(CollectGcBiasMetrics.java:212) >> at >> net >> .sf >> .picard >> .cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150) >> at >> net >> .sf >> .picard.analysis.CollectGcBiasMetrics.main(CollectGcBiasMetrics.java: >> 91) >> >> >> ------------------------------------------------------------------------------ >> ThinkGeek and WIRED's GeekDad team up for the Ultimate >> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the >> lucky parental unit. See the prize list and enter to win: >> http://p.sf.net/sfu/thinkgeek-promo >> _______________________________________________ >> Samtools-help mailing list >> Sam...@li... >> https://lists.sourceforge.net/lists/listinfo/samtools-help > > > ------------------------------------------------------------------------------ > ThinkGeek and WIRED's GeekDad team up for the Ultimate > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > lucky parental unit. See the prize list and enter to win: > http://p.sf.net/sfu/thinkgeek-promo > _______________________________________________ > Samtools-help mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-help |
From: Michael M. <mmu...@hu...> - 2010-07-21 19:37:23
|
Hello again How are the bins defined for the histogram produced by MarkDuplicates? Thanks Mike Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmu...@hu... (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806 |
From: Michael M. <mmu...@hu...> - 2010-07-21 19:44:01
|
And how is the estimate of library size calculated? M Begin forwarded message: > From: Michael Muratet <mmu...@hu...> > Date: July 21, 2010 2:37:12 PM CDT > To: Alec Wysoker <al...@br...> > Cc: samtools help <sam...@li...> > Subject: Definition of histogram from MarkDuplicates > > Hello again > > How are the bins defined for the histogram produced by MarkDuplicates? > > Thanks > > Mike > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmu...@hu... > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > > > > Michael Muratet, Ph.D. Senior Scientist HudsonAlpha Institute for Biotechnology mmu...@hu... (256) 327-0473 (p) (256) 327-0966 (f) Room 4005 601 Genome Way Huntsville, Alabama 35806 |
From: Alec W. <al...@br...> - 2010-08-02 13:47:31
|
Hi Michael, Algorithm is described here: http://picard.sourceforge.net/javadoc/net/sf/picard/sam/DuplicationMetrics.html#estimateLibrarySize(long,%20long) Code is here: http://picard.svn.sourceforge.net/viewvc/picard/trunk/src/java/net/sf/picard/sam/DuplicationMetrics.java?revision=532&view=markup -Alec On 7/21/10 3:38 PM, Michael Muratet wrote: > And how is the estimate of library size calculated? > > M > > Begin forwarded message: > > >> From: Michael Muratet<mmu...@hu...> >> Date: July 21, 2010 2:37:12 PM CDT >> To: Alec Wysoker<al...@br...> >> Cc: samtools help<sam...@li...> >> Subject: Definition of histogram from MarkDuplicates >> >> Hello again >> >> How are the bins defined for the histogram produced by MarkDuplicates? >> >> Thanks >> >> Mike >> >> Michael Muratet, Ph.D. >> Senior Scientist >> HudsonAlpha Institute for Biotechnology >> mmu...@hu... >> (256) 327-0473 (p) >> (256) 327-0966 (f) >> >> Room 4005 >> 601 Genome Way >> Huntsville, Alabama 35806 >> >> >> >> >> >> > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmu...@hu... > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > > > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Sprint > What will you do first with EVO, the first 4G phone? > Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first > _______________________________________________ > Samtools-help mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-help > |
From: Alec W. <al...@br...> - 2010-08-02 13:55:33
|
Hi Mike, Histogram is described here: http://picard.sourceforge.net/javadoc/net/sf/picard/sam/DuplicationMetrics.html#estimateRoi%28long,%20double,%20long,%20long%29 First column is the input parameter x: the multiple of sequencing to be simulated (i.e. how many X sequencing) Second column is the return value: a number z <= x that estimates if you had pairs*x as your sequencing then you would observe uniquePairs*z unique pairs. -Alec On 7/21/10 3:37 PM, Michael Muratet wrote: > Hello again > > How are the bins defined for the histogram produced by MarkDuplicates? > > Thanks > > Mike > > Michael Muratet, Ph.D. > Senior Scientist > HudsonAlpha Institute for Biotechnology > mmu...@hu... > (256) 327-0473 (p) > (256) 327-0966 (f) > > Room 4005 > 601 Genome Way > Huntsville, Alabama 35806 > > > > > |
From: Alec W. <al...@br...> - 2010-11-01 14:00:06
|
Picard release 1.34 1 November 2010 - Interval.java: Change validation on to check for end< start-1 (instead of end< start) since end = start-1 is used to represent 0-length intervals. - Enable MarkDuplicates to work in a reasonable amount of RAM even for a reference with many sequences, by using an LRU cache for open file handles in order to avoid exceeding maximum number of open file handles. MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP option is obsolete. MAX_FILE_HANDLES_FOR_READ_ENDS_MAP controls the size of the LRU cache of open files. - AbstractAlignmentMerger.java: Modified so we don't try to calculate UQ tag if QUAL is set to *. - ValidateSamFile: Store PairEndInfo in file-based data structure so it runs in fixed amount of RAM. Added check for read marked as paired for which a mate is not found. - AbstractFastaSequenceFile.java: When trying to find a sequence dictionary corresponding to a reference fasta, try filename.dict if basename.dict isn't found. Patch from Nils Homer. - SamToFastq.java: Flush output files before checking for unpaired mates. - SamToFastq may now output fastq files per read group if OUTPUT_PER_RG (OPRG) is specified. If OUTPUT_DIR is specified with OUTPUT_PER_RG then a file (two if paired end) will be output in OUTPUT_DIR per read group named after the platform unit for the read group. - FastqToSam no longer fails on read if trailing /1 or /2 is missing IF the read names are identical AND don't end in /1 or /2. - CommandLinePrograms no longer display common options unless -H or --stdhelp is used or there was an error with a standard option. - CollectAlignmentSummaryMetrics no longer fails if there is no reference sequence provided. - Fix auto-MD5 generation, which wasn't working because of check that output file already exists rather than that it is a normal path. - Emit warning if MD5 or BAM index creation is requested, but can't be done because output BAM is not a regular file. -Alec |
From: Sendu B. <sb...@sa...> - 2010-11-02 15:32:48
|
On 01/11/2010 13:59, Alec Wysoker wrote: > Picard release 1.34 - SamToFastq.java: Flush output files before > checking for unpaired mates. > > - SamToFastq may now output fastq files per read group if > OUTPUT_PER_RG (OPRG) is specified. If OUTPUT_DIR is specified with > OUTPUT_PER_RG then a file (two if paired end) will be output in > OUTPUT_DIR per read group named after the platform unit for the read > group. I've not used SamToFastq yet, but wondered about it's behaviour: Does it work given a bam file instead of a sam file? What happens when the sam file contains a mixture of paired and unpaired reads for the same read group? Ideally the output would be 3 fastq files (forward, reverse, unpaired)... Does it auto-handle '=' in place of reference bases, converting them back to reference bases in the fastq output? If it sees an OQ tag, will it output the original quality to the fastq, or the current quality string? Cheers, Sendu. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |