From: Heng Li <lh...@sa...> - 2013-11-20 16:44:13
|
Writing the read count in each bin is not only useful for retrieving the i-th alignment, but also useful for alignment viewers to display the approximate read depth across a chromosome, and for anyone to get the sense of approximate coverage in long range. Right now, there is no convenient way to achieve this. Do we want to keep counts in CSI in future? If we want the feature, we'd better change the CSI format before the first release of samtools. Changing the CSI format later would cause more troubles in compatibility as more users would have generated CSI index. To implement the feature now, we may simply write "0" to the index file as a placeholder and change the magic from "CSI\1" to "CSI\2". When htslib sees "CSI\2", it reads the count from the index; otherwise not. What do you think? Heng On Nov 14, 2013, at 3:30 AM, Petr Danecek <pd...@sa...> wrote: > Hi Magnus, > > currently there is no better way than reading the file sequentially. > Heng recently proposed that the CSI index could be changed to include > the counts. > >> I was thinking to add counts to the CSI index. The entire CSI >> index is essentially an R-tree with nodes named with bins and >> kept in a hash table. In principle, we can store the number >> of alignments starting/overlapping the bin in the >> corresponding node. This way, we may quickly get the number >> of alignments in any regions and perform a query like >> "retrieve the 10000-th alignment". > > As far as I know, no one have done that yet. Would you be interested to > look into this? > > Petr > > > On Wed, 2013-11-13 at 09:46 +0000, Magnus Manske wrote: >> Given a BAM file, is there a way to access a specific read by its >> number? That is, I want the 12345th reads from the "samtools view" >> output, without having to actually print/read through the previous >> 12344 ones. Seeking by read number, instead of by mapped read position. >> >> samtools or htscmd/htslib would be fine. Willing to code C :-) >> >> Thanks, >> Magnus >> ------------------------------------------------------------------------------ >> DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps >> OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access >> Free app hosting. Or install the open source package on any LAMP server. >> Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native! >> http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk >> _______________________________________________ >> Samtools-help mailing list >> Sam...@li... >> https://lists.sourceforge.net/lists/listinfo/samtools-help > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > > ------------------------------------------------------------------------------ > DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps > OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access > Free app hosting. Or install the open source package on any LAMP server. > Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native! > http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk > _______________________________________________ > Samtools-help mailing list > Sam...@li... > https://lists.sourceforge.net/lists/listinfo/samtools-help -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |