|
From: Ivan G. <iva...@gm...> - 2011-03-18 13:56:55
|
Hello Chris and Heng, I would like to request this new version not to be incorporated into samtools yet. The reason for this is that the new version has lost its ability to accept a pipe. For example, lets say that you have three export.txt.gz files that you need to convert into SAM format. With the release version you just do zcat s_[123]_export.txt.gz | export2sam.pl --read1=- This does not work with the proposed update. I wonder if Chris would be willing to perfect the update. Thank you, Ivan Ivan Gregoretti, PhD National Institute of Diabetes and Digestive and Kidney Diseases National Institutes of Health 5 Memorial Dr, Building 5, Room 205. Bethesda, MD 20892. USA. Phone: 1-301-496-1016 and 1-301-496-1592 Fax: 1-301-496-9878 On Thu, Mar 17, 2011 at 8:31 PM, Heng Li <lh...@sa...> wrote: > Thanks a lot, Chris. I will incorporate your changes to the samtools source code tree. > > Best, > > Heng > > > On Mar 17, 2011, at 6:03 PM, Saunders, Chris wrote: > >> Hello, >> >> I'm writing to submit two code updates from Illumina's secondary analysis pipeline as potential contributions back to the samtools distribution. These are (1) a bam concatenation tool and (2) a revision of the 'export2sam.pl' script. The details of these updates are as follows: >> >> (1) bam concatenation tool: >> >> This tool concatenates BGZF blocks from multiple BAM files to create a concatenated BAM output file. It can be used as a fast alternative to 'samtools merge' in situations where the input (sorted) BAM files can be arranged in an order such that the last read in BAM file i is not greater than the first read in BAM file i+1. An example where this is commonly applied is the concatenation of chromosome BAMs into a single file. The usage for the tool (as submitted) is: >> >> "bam_cat <in.header.sam> <out.bam> <in1.bam> <in2.bam> [...]" >> >> This code is contained in the attached file "bam_cat.c". It is a derivative of the file "bam_reheader.c" from samtools 0.1.8. >> >> >> (2) Update of export2sam.pl script: >> >> Several minor features updates have been applied to the 'export2sam.pl' script since our last submission of this script to samtools (this being script version 2.0.0). These updates are described in the script's changelog as follows: >> >> """ >> # Version: 2.3.0 (24JAN2011) >> # >> # - Add support for export reserved chromosome name "CONTROL", >> # which is translated to optional field "XC:Z:CONTROL". >> # - Check for ".gz" file extension on export files and open >> # these as gzip pipes when the extension is found. >> # >> # Version: 2.2.0 (16NOV2010) >> # >> # - Remove any leading zeros in export fields: RUNNO,LANE,TILE,X,Y >> # - For export records with reserved chromosome name identifiers >> # "QC" and "RM", add the optional field "XC:Z:QC" or "XC:Z:RM" >> # to the SAM record, so that these cases can be distinguished >> # from other unmatched reads. >> # >> # Version: 2.1.0 (21SEP2010) >> # >> # - Additional export record error checking. >> # - Convert export records with chromosome value of "RM" to unmapped >> # SAM records. >> """ >> >> The updated script is attached to this email as 'export2sam.pl'. The updated script usage is: >> >> """ >> >> export2sam.pl converts GERALD export files to SAM format. >> >> Usage: export2sam.pl --read1=FILENAME [ options ] | --version | --help >> >> --read1=FILENAME read1 export file (mandatory) >> (file may be gzipped with ".gz" extension) >> --read2=FILENAME read2 export file >> (file may be gzipped with ".gz" extension) >> --nofilter include reads that failed the basecaller >> purity filter >> --qlogodds assume export file(s) use logodds quality values >> as reported by OLB (Pipeline) prior to v1.3 >> (default: phred quality values) >> >> """ >> >> >> Thank you for considering these as part of any future samtools release, and please contact me with issues or questions regarding either update. In particular, I'm keen to hear about any potential portability issues with the bam concatenation which we may have overlooked, so any feedback on this topic would be greatly appreciated. >> >> Regards, >> >> -Chris >> >> >> >> Christopher Saunders, Ph.D. >> Senior Bioinformatics Scientist >> Illumina, Inc. >> 9885 Towne Centre Drive >> San Diego, CA 92121-1975 >> Email: csa...@il... >> >> >> >> >> <bam_cat.c><export2sam.pl>------------------------------------------------------------------------------ >> Colocation vs. Managed Hosting >> A question and answer guide to determining the best fit >> for your organization - today and in the future. >> http://p.sf.net/sfu/internap-sfd2d_______________________________________________ >> Samtools-devel mailing list >> Sam...@li... >> https://lists.sourceforge.net/lists/listinfo/samtools-devel > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > > ------------------------------------------------------------------------------ > Colocation vs. Managed Hosting > A question and answer guide to determining the best fit > for your organization - today and in the future. > http://p.sf.net/sfu/internap-sfd2d > _______________________________________________ > Samtools-devel mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-devel > |