From: Alec W. <al...@br...> - 2011-05-31 20:23:40
|
Hi Robin, I don't remember exactly. Have you looked on the gencode website? -Alec On 5/27/11 2:48 PM, mei ge wrote: > Hi Alec, > I need the ribosomal interval file for mouse. Where did you get the > gencode gtf file? > Thanks > Robin > > --- On *Thu, 5/26/11, Alec Wysoker /<al...@br...>/* wrote: > > > From: Alec Wysoker <al...@br...> > Subject: Re: [Samtools-help] Picard release 1.46 > To: "mei ge" <rge...@ya...> > Cc: "Sam...@li..." > <Sam...@li...> > Date: Thursday, May 26, 2011, 10:58 AM > > Hi Robin, > > Human HG19 is attached. I don't have one for mouse. I just > extracted these from gencode GTF file. Presumably you could do > the same for mouse. Interval list format described here: > http://picard.sourceforge.net/javadoc/net/sf/picard/util/IntervalList.html > > Note that this file is not required by CollectRnaSeqMetrics, but > if you do not provide it the program will not be able to determine > when reads map to rRNA. > > -Alec > > On 5/26/11 9:54 AM, mei ge wrote: >> Hi Alec: >> I am looking for human (hg19) and mouse(mm9). >> Thanks >> Robin >> >> --- On *Wed, 5/25/11, Alec Wysoker /<al...@br...> >> <http://us.mc388.mail.yahoo.com/mc/compose?to=al...@br...>/* >> wrote: >> >> >> From: Alec Wysoker <al...@br...> >> <http://us.mc388.mail.yahoo.com/mc/compose?to=al...@br...> >> Subject: Re: [Samtools-help] Picard release 1.46 >> To: "mei ge" <rge...@ya...> >> <http://us.mc388.mail.yahoo.com/mc/compose?to=rge...@ya...> >> Cc: "samtools help" <sam...@li...> >> <http://us.mc388.mail.yahoo.com/mc/compose?to=sam...@li...> >> Date: Wednesday, May 25, 2011, 4:52 PM >> >> Hi Mei, >> >> Are you looking for human? If so, what genome build? >> >> -Alec >> >> On 5/25/11 3:15 PM, mei ge wrote: >>> Hi Alec, >>> I use the CollectRnaSeqMetrics in the new release Picard. I >>> don't know where to get the Ribosomal_intervals file. Can >>> you point a place for the file? >>> Thanks >>> Mei >>> |
From: Alec W. <al...@br...> - 2011-06-07 14:36:09
|
Picard release 1.47 7 June 2011 - DownsampleSam.java: simple utility to randomly downsample a SAM or BAM file. - IntervalListTools.java: tools to sort, merge, unique, pad and report on what's in an interval list. - CollectRnaSeqMetrics.java: Improve error message when sequence dictionaries differ. - We are experimenting with a library called Snappy, which we plan to use to compress temporary files created when merge-sorting. We have discovered a couple of problems, however, so this code is checked in but disabled currently. - EstimateLibraryComplexity.java: Fixed an index out of bounds error in passesQualityCheck. - MarkDuplicates.java: Add option SORTING_COLLECTION_SIZE_RATIO to deal with out-of-memory issues. - CollectAlignmentSummaryMetrics.java: Added a PF_INDEL_RATE to collect alignment summary metrics that calculates total number of short insertions/deletions seen in reads / total aligned pf bases. - Histogram.java: Added methods for calculating: 1) The geometric mean 2) The median absolute deviation 3) Estimating the SD of a quasi-normal distribution via the median absolute deviation - CollectInsertSizeMetrics.java: Changed CollectInsertSizeMetrics to use a more sensible approach to trimming the distribution of insert sizes before calculating the mean and sd, and for sizing the plot. The new method is simply to trim the distribution to [0..(median + 10median_absolute_deviation)]. This works well when the distribution is mostly normal (approximating mean+6.7sd), and is much more robust to bimodality and other strange distributions. - Histogram.java: More robust implementation of getMedian that does the right thing if there are an even number of things in the histogram. - SequenceUtil.java: Added a utility method for calculating GC% for a byte[] of bases. - CalculateHsMetrics.java: Added code for two new features: 1) the addition of HS AT and GC dropout metrics to measure GC bias in a way similar to whole genome data, but just for the target regions. 2) the addition of a PER_TARGET_COVERAGE option which allows the output of detailed per-target metrics for ad-hoc analysis. -Alec |
From: Alec W. <al...@br...> - 2011-06-20 14:53:35
|
Picard release 1.48 20 June 2011 - Added support for Snappy compression of temporary files created with SortingCollection. This can reduce the size of these temporary files by ~50%, without a large cost in CPU time. See https://sourceforge.net/apps/mediawiki/picard/index.php?title=Using_Snappy_in_Picard for details. - AbstractAlignmentMerger.java: Added some progress logging during the initial phase of MergeBamAlignment. - CleanSam.java: Don't throw an exception if a read is unmapped. - SequenceUtil.java: Add new overload of assertSequenceDictionariesEqual that reports filenames. -Alec |
From: Mark A. D. <dep...@br...> - 2011-06-20 14:55:33
|
Hi Alec, This snappy compression -- is it enabled by default? We have several tools using the SortingCollections and could benefit from this. Best, On Jun 20, 2011, at 10:53 AM, Alec Wysoker wrote: > Picard release 1.48 > 20 June 2011 > > - Added support for Snappy compression of temporary files created with > SortingCollection. This can reduce the size of these temporary files by > ~50%, without a large cost in CPU time. See > https://sourceforge.net/apps/mediawiki/picard/index.php?title=Using_Snappy_in_Picard > for details. > > - AbstractAlignmentMerger.java: Added some progress logging during the > initial phase of MergeBamAlignment. > > - CleanSam.java: Don't throw an exception if a read is unmapped. > > - SequenceUtil.java: Add new overload of assertSequenceDictionariesEqual > that reports filenames. > > -Alec > > ------------------------------------------------------------------------------ > EditLive Enterprise is the world's most technically advanced content > authoring tool. Experience the power of Track Changes, Inline Image > Editing and ensure content is compliant with Accessibility Checking. > http://p.sf.net/sfu/ephox-dev2dev > _______________________________________________ > Samtools-devel mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-devel Mark A. DePristo, Ph.D. Manager, Medical and Population Genetics Analysis Broad Institute of MIT and Harvard dep...@br... ma...@de... |
From: Alec W. <al...@br...> - 2011-07-18 18:52:47
|
Picard release 1.49 18 July 2011 - Include snappy properties files so DLL written to temp directory gets appropriate version number in its filename. Prior to this change, a small number of users saw strange, non-reproducible JVM crashes because the Snappy-java DLL, which is extracted by Snappy java code and written to a temp directory, was overwritten by another process. - Update to official Snappy-java 1.0.3-rc3. - ReorderSam.jar: Improve error message when sequence dictionary is not present. - SAMTextHeaderCodec.java: Do not skip RG line if missing SM tag when validation stringency is not strict. - MergeSamFiles.java: Set non-zero exit status if something went wrong in one of the threads when USE_THREADING=true. - Modified CollectAlignmentSummaryMetrics, CollectInsertSizeMetrics and CalculateHsMetrics to support collection of metrics at multiple levels: ALL_READS is the default and will calculate metrics based on all reads in the file; SAMPLE will calculate a separate metric for each sample in the file; LIBRARY will calculate metrics for each library and READ_GROUP will do the same for each read group. Metrics can be calculated for multiple levels (e.g. SAMPLE and LIBRARY level can be calculated in one pass). The output files contain three new columns: SAMPLE, LIBRARY, and READ_GROUP. If all three are blank, the metric applies to all reads in the file. If only SAMPLE is non-blank, then the metrics applies to the sample listed. If only SAMPLE and LIBRARY are non-blank, then the metric applies to the LIBRARY listed. If all three are filed in, then the metric applies to the READ_GROUP listed./ - CommandLineProgram.java: Added code to print out a one-liner at the start of every CommandLineProgram that documents user/host os version and architecture and JVM information. - IndexedFastaSequenceFile.java: Don't throw an exception if a zero-length subsequence is asked for. - SeekableHTTPStream.java: Improve message on IndexOutOfBoundsException. Implement read() method. Patch courtesy of Thomas Abeel. - SamFileValidator.java: Explicitly validate that read1.getFirstOfPair != read2.getFirstOfPair. - SAMUtils.java: Throw SAMFormatException rather than RuntimeException in SAMUtils.processValidationErrors. Patch courtesy of Matti Niemenmaa. - ExtractIlluminaBarcodes.java: Allow a read to match barcode to match even if all bases disagree with barcode, if running program in such a way as to force all reads to match a single barcode. - Histogram.java: Speed up Histogram.trimByWidth for large histograms. -Alec |
From: Alec W. <al...@br...> - 2011-08-02 14:57:41
|
Picard release 1.50 2 August 2011 - Make some methods public so that SAMTextWriter can be used to format SAMRecords as text without having to write a SAM file. Patch courtesy of Matti Niemenmaa. - BlockCompressedInputStream.java: Implement readLine() method for Tabix. Patch courtesy of Matthias Haimel and Fred Long. - SamToFastq.java: Skip non-primary alignments by default. If default is overridden, exceptions may result for paired reads with non-primary alignments. - FastqToSam.java: Add options MIN_Q and MAX_Q to FastqToSam. - CommandLineParser.java: Eliminate requirement that field annotated with @PositionalArguments must be public. - Changes to add three new metrics to CollectRnaSeqMetrics - median cv of coverage, 5' bias and 3' bias, all using the top 1000 expressed transcripts. - MergeSamFiles.java: Add @CO lines to output SAM header regardless of whether output needs to be sorted or not. - CollectRnaSeqMetrics.java: Added a handful more metrics to RNA metrics: PF_BASES, PCT_USABLE_BASES and MEDIAN_5PRIME_TO_3PRIME_BIAS - MergeBamAlignment.java: Added the ability to specify queryname as the sort order for final output of MergeBamAlignment. Note that this does not save a sort, as records are internally coordinate-sorted in order to set tags (NM, etc) that require a comparison with the reference. Default behavior is unchanged. - CollectRnaSeqMetrics.java: Added a plot of normalized position along transcript vs. normalized coverage to the RNA seq metrics. -Alec |
From: Alec W. <al...@br...> - 2011-08-15 15:02:24
|
Picard release 1.51 15 August 2011 - Add support for array tag values (type B) for signed and unsigned byte, short, int; and for float. Note that previously byte arrays were stored as hex ascii string (type H). Type H will still be read, but no longer written. The only exception is that if a BAM record containing a tag of H type is read from a file, and none of the variable length part of the BAM record is changed, then if the BAM record is written to another file, the tag will still be H type. - MergeBamAlignment.java: Fix bug triggered when EXPECTED_ORIENTATIONS is not set on the command line. - MergeSamFiles.java: Fix bug created 7/28 in which if input were unsorted, SORT_ORDER=coordinate was ignored. - MergeBamAlignment.java: Add option to disable soft clipping if 3' end of read extends past 5' end of mate. - insertSizeHistogram.R: change script so that spaces are tolerated in sample/library names, columns separated on tabs only. - SAMReadGroupRecord.java: Support FO (flow order) and KS (key sequence) attributes of read group. - meanQualityByCycle.R, qualityScoreDistribution.R: Added tab separator for reading metrics files instead of default whitespace separator, to better handle data that includes sample names which may have spaced in them. - BlockCompressedInputStream.java: Lazily initialize ByteArrayOutputStream so that it is not allocated if readLine is never called. - CollectRnaSeqMetrics.java: Added check that there's actually data in the histogram to plot before calling R. -Alec |
From: Alec W. <al...@br...> - 2011-08-29 17:28:35
|
Picard release 1.52 29 August 2011 IMPORTANT: As of this release, sam.jar (comprising net.sf.samtools packages) requires Java 1.6 or newer. picard.jar (net.sf.picard) has required Java 1.6 for quite a while (years?). - CloserUtil.java: Make null-safe. - Add IGNORE_SEQUENCE option to CollectRnaSeqMetrics to enable specification of one or more sequences that should not be counted in most of the RNA seq metrics. - AbstractInputParser.java: added flag for skipping blank lines in input files (now the default behavior). - TabbedTextFileWithHeaderParser.java: Added a method to get the column names back. - Metrics that are aggregated at multiple levels now share a common ancestor MultilevelMetrics. - Md5CalculatingInputStream and Md5CalculatingOutputStream now have methods for accessing the md5 hash after they are closed (previously they would only write it to a file). - Modifications to support an ordered list of temp directories in Picard programs. Programs will generally use the list in order until there is insufficient free space in a directory to be sure of completely writing a file there. If all directories run short of space the last directory in the list continues to be used. - IoUtil.java: Added methods for buffered reading and writing of UTF-8 encoded files. - Add way to merge IntervalLists without making hugely long interval names if merging lots of intervals. - Small change to attempt to make any temp directories world readable and writable. - CollectRnaSeqMetrics.java: Change algorithm for determining if a read is rRNA. Now works by looking at overlap of entire fragment with rRNA intervals, rather than base-by-base overlap with rRNA intervals. - Removed AsciiLineReader and AsciiLineReaderImpl as there does not appear to be a performance benefit to their use, now that Java 1.5 is no longer supported. - AbstractInputParser.java: Fix isComment so it does not NPE when encountering a blank line. - SAMBinaryTagAndUnsignedArrayValue.java, SAMBinaryTagAndValue.java: Make these classes public for GATK. -Alec |
From: Alec W. <al...@br...> - 2011-09-26 14:26:30
|
Picard Release 1.53 26 September 2011 - ReorderSam.java: Get mate reference index before changing header on SAMRecord, in case input is SAM format and reference index is found by looking up reference name in header. - SAMSequenceDictionary.java: Added getReferenceLength method to sum the lengths of the sequences in the dictionary. - SAMRecord.java: Add notes about lack of validation when values are set into SAMRecord. - Add mechanism for converting a SAMRecord to SAM-text format, without having to write a SAM file. Patch courtesy of Fred Long. - CollectRnaSeqMetrics.java: If rRNA.interval_list has not been provided, write empty values for RIBOSOMAL_BASES and PCT_RIBISOMAL_BASES rather than 0. - IlluminaBasecallsToSamConverter.java: Small change to output the "BC" tag with the barcode read sequence, but into the unmatched read file only. - Add Bzip2 support to Picard. - CollectAlignmentSummaryMetrics.java: modified chimerism % calculation so that the denominator is only those reads considered as possible chimeras (was all hq pf reads) - EstimateLibraryComplexity.java: Fix bug in which, if read names were not standard Illumina syntax, all dupes were considered to be optical dupes, and thus library complexity could not be estimated. - IlluminaBasecallsToSam refactor: Removed support for Bustard 1.1. - EstimateLibraryComplexity.java: Initialize PairedReadSequence.readGroup to -1 so that no read group is detectable. Fixes ArrayIndexException. -Alec |
From: Alec W. <al...@br...> - 2011-10-11 14:58:36
|
Picard release 1.54 11 October 2011 - Modify CollectRnaSeqMetrics.java to support multi-level metrics collection. - CollectAlignmentSummaryMetrics.java: Guard against PCT_CHIMERAS denominator being 0. - Refactoring of IlluminaBaseCallsToSam. - Remove upper limit on BAM record size. - CollectAlignmentSummaryMetrics.java: Added a bunch of IF statements to protect against divide by zeros. - Add bzip2 classes into picard jar. - Support IUPAC ambiguity codes in SAM and BAM. - Remove validation that requires that there be no CIGAR for unmapped read, because use of SAM for assembly has reads marked as unmapped that have CIGAR. -Alec |
From: Alec W. <al...@br...> - 2011-10-24 14:05:44
|
Picard release 1.55 24 October 2011 - IoUtil.java: Fixed openFileForBufferedWriting() methods to vector through openFileForWriting methods which do the detection of .gz/.bz extensions and open up the appropriate compression output streams. - Implemented an option to allow for gzipping the output files from ExtractIlluminaBarcodes, and added a little bit of progress logging too. - Changes to introduce a SAMRecordFactory for use in SAMFileReader and related classes. Allows users of the picard/sam API to substitute a different factory that will create and return custom sub-classes of SAMRecord and BAMRecord. -Alec |
From: Alec W. <al...@br...> - 2011-11-08 14:04:18
|
Picard release 1.56 8 November 2011 - SnappyLoader.java: Handle exception thrown when trying and failing to load Snappy. Makes things work on non-Linux platforms. - IlluminaUtil.java: Added nextera v2 adapter sequence (don't mention in release notes) - ClippingUtility.java: Fix so that clipping will walk back to the read start until a match is found instead of adapter_length back from the read end. -Alec |
From: Alec W. <al...@br...> - 2011-12-05 15:58:47
|
Picard release 1.57 5 December 2011 - The code underlying IlluminaBasecallsToSam and ExtractIlluminaBarcodes is undergoing a major refactoring. Currently these programs can be used in two ways. The old process, involved providing BARCODE information and letting the programs detect the structure of your data via the available QSeqs. The new technique requires that you provide a READ_STRUCTURE, which describes the structure of clusters in a run. IlluminaBasecallsToSam/EtractIlluminaBarcode will then treat input cluster data as if they fit the input structure. Please refer to the documentation for the READ_STRUCTURE in IlluminaBasecallsToSam and ExtractIlluminaBarcodes for more information. Important Note: The old process is still available but will be removed in future releases in favor of the new process. - Allow empty SEQ field in SAMRecord if FZ tag is present. - Enable BAMIndexer to write to OutputStream in addition to File. - BuildBamIndex.java: Close input BAM at end in order to reduce file handle leak in cases where this program is being invoked from inside a larger program. - FormatUtil.java: Support File fields when loading Metric from file. Support Iso8601Date in order to have date & time metrics. - Add --version option to command-line programs, and put Implementation-Version into jar manifests. Add program version to preamble printed when program starts. - BAMRecordCodec.java: Clarify requirements for encode method in javadoc. - MergeSamFiles.java: Fix bug in MergeSamFiles with USE_THREADING=true. Sometimes not all the records would get to the output. Sometimes program would hang. Sometimes it worked fine. - MetricsFile.java: Add static method readBean(). - CollectInsertSizeMetrics.java: Eliminate obsolete mention of TAIL_LIMIT from usage message. - Make SAMRecord.mReferenceIndex and mMateReferenceIndex protected so that subclasses can access these attributes. - Make SAMTagUtil and SAMRecord attribute methods that take binary tag names public. - FilterSamReads.java: new class that Filters out mapped or unmapped reads from an INPUT sam/bam file and writes out the remaining reads to the specified OUTPUT sam/bam file. - added PRINT_READ_CATEGORY option to CollectAlignmentSummaryMetrics. - Added a line to the insertSizeHistogram.R to allow for '#' signs in sample/library names. -Alec |
From: Alec W. <al...@br...> - 2011-12-20 15:29:01
|
Picard release 1.58 20 December 2011 - HsMetricsCalculator.java: Fix computation of coverage in PER_TARGET_COVERAGE output file. Fix courtesy of Matt Ducar. - Add SVN revision to version string. - Add COMMENT argument to FastqToSam. -Alec |
From: Alec W. <al...@br...> - 2012-01-04 15:38:18
|
Picard release 1.59 4 January 2012 - A new feature has been added to SAMFileWriterFactory and FastqWriterFactory that creates a separate thread for writing a SAM, BAM or FASTQ file. We have used this feature successfully for a couple of weeks, but because of its newness it should be used with caution until we get some more time to determine if there are problems. This allows computationally-intensive programs to maximize CPU usage rather than blocking while doing I/O. This feature is enabled by passing the option -Dsamjdk.use_async_io=true to the java command. It may be controlled programmatically via net.sf.samtools.SAMFileWriterFactory.setUseAsyncIo and net.sf.picard.fastq.FastqWriterFactory.setUseAsyncIo. There is a memory penalty for this in that a queue is needed to hold records to be written. Queue size is 2000 elements by default, but may be controlled via net.sf.samtools.SAMFileWriterFactory.setAsyncOutputBufferSize. Queue size cannot be changed for FastqWriterFactory. - Several system properties have been added to control the behavior of various Picard programs. These can be set by passing -D<property-name>=<value> to the java command. - samjdk.create_index : Sets the default value for the CREATE_INDEX option. E.g. -Dsamjdk.create_index=true - samjdk.create_md5 : Sets the default value for the CREATE_MD5_FILE option. E.g. -Dsamjdk.create_md5=true - samjdk.compression_level : Sets the default value for the COMPRESSSION_LEVEL option. E.g. -Dsamjdk.compression_level=0 - samjdk.use_async_io : Enables asynchronous writing of SAM, BAM or FASTQ as described above. The regular command-line options, i.e. CREATE_INDEX, CREATE_MD5_FILE or COMPRESSION_LEVEL, override the value passed via system property. See http://picard.sourceforge.net/command-line-overview.shtml#Overview for details on these options. - SAMFileWriterFactory.java: Allow change to temp directory used by sorting SAMFileWriters without having to change java.io.tmpdir - MergeBamAlignment.java: Added the ability to carry-through the MAX_RECORDS_IN_RAM common parameter into the AbstractAlignmentMerger from MergeBamAlignment. - SnappyLoader.java: Avoid NoClassDefFoundError if snappy-java classes are not available. - ReadEndsCodec.java: Removed an extraneous flush() call that was hurting performance. - MarkDuplicates.java: Change so that MarkDuplicates provides the full list of tmp directories to SortingCollection. - AbstractDuplicateFindingAlgorithm.java: Added specialized implementation of addLocationInformation() that doesn't rely on using a regular expression so long as the default regular expression is specified. - BinaryTagCodec.java: Small change to make decoding attributes from a BAM file previously written through the SAM-JDK more efficient. - Change to MarkDuplicates to allow it to accept and merge multiple input files and write out the merged and marked results to a single file. Merging isn't as fully-featured as MergeSamFiles. - Added ability to add CO headers while Marking Duplicates. -Alec |