From: Oler, A. (NIH/N. [C] <and...@ni...> - 2012-04-23 15:15:25
|
Hi David, This is what I expected as the reason for what I was seeing. Thanks in advance for making the changes. Can you please let me know when the new version is available? Thanks, Andrew Andrew Oler, Ph.D. High-Throughput Sequencing Bioinformatics Specialist Contractor – Medical Science & Computing, Inc. Computational Biology Section Bioinformatics and Computational Biosciences Branch (BCBB) OCICB/OSMO/OD/NIAID/NIH 31 Center Drive, Room 3B62 Bethesda, MD 20892 Mobile: 240-507-3791 Office: 301-402-5685 http://bioinformatics.niaid.nih.gov<http://bioinformatics.niaid.nih.gov/> (Within NIH) http://exon.niaid.nih.gov<http://exon.niaid.nih.gov/> (Public) Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives. From: David Nix <Dav...@hc...<mailto:Dav...@hc...>> To: "Oler, Andrew (NIH/NIAID) [C]" <and...@ni...<mailto:and...@ni...>> Cc: "use...@li...<mailto:use...@li...>" <use...@li...<mailto:use...@li...>> Subject: Re: ORSS not counting reads properly? Hello Andrew, OK, I found out the issue. I'm calling Picard's SAMRecordIterator i = reader.queryOverlapping(chromosome, start, stop); method to retrieve all alignments that overlap each exon. I then hash the names of the hits to reduce this to the number of fragments found to hit the entire gene. For our standard Extended Junction Analysis with know splices and annotation derived theoretical splices this works well (http://useq.sourceforge.net/usageRNASeq.html) since alignments will share sequence with at least with one of the genes exons. (There are very few to no multi gene spanners with this approach, see below.) The problem is that you're using tophat to call splice matches and that package is generating a lot of splices that have huge spans that overlap all the gene's exons in genomic space but share no sequence. For example with Cav3, there are 572 such spanners, where the read is split in half with 300k+ bp between them. These reads span multiple genes and are counted as hits to each. Here's a couple of the spanners: HWI-ST183_0264:7:2208:7929:127398#0 0 chr6 112343683 255 28M309966N16M * 0 0 GGACGTTCGGATCTTCTTCTTTTTGTGGCTGGGGGGGGCGGGTT DFADFA;FHGAFGIICGG@@CFHGGCGG FFFHBBA6@####### NH:i:1 NM:i:3 XS:A:- HWI-ST183_0264:7:1105:15979:27960#0 0 chr6 112343684 255 27M309966N17M * 0 0 GGCGTTCGGATCTTCTTCTTTTTGTGGCTGGGGGGGGGGTGTTC FFHHHHHJJFHGHIHIJIJJJJJFBGHI II6D<A6@######## NH:i:1 NM:i:3 XS:A:- HWI-ST183_0264:7:2205:3254:80300#0 0 chr6 112343686 255 25M309966N19M * 0 0 CGTTCGGATCTTCTTCTTTTTGTGGCTGGGGGGGGGGGTTTGTC FFHHHGGGIIJJJJIIJJJJJJHGIIJJI III@B@######### NH:i:1 NM:i:3 XS:A:- HWI-ST183_0264:7:1101:15952:159627#0 0 chr6 112343687 255 24M309966N20M * 0 0 GTTCGGATCTTCTTCTTTTTGTGGCTGGGGGGGGGGGTTTCTCC FFHHHHFHHHIHIIIJJJJJJIJIIIJ CB?7B########### NH:i:1 NM:i:3 XS:A:- One fix here is to put in a requirement that each alignment actually share sequence with the exons in a gene. Should be a pretty simple change. Thanks for bringing this to my attention! Note, for those looking for hit counts for every gene, set the -e 1. Currently this defaults to 20 and only genes with this number of total hits will be scored. I'll change this too in the next release. -cheers, D On 4/19/12 10:12 AM, "Oler, Andrew (NIH/NIAID) [C]" <and...@ni...<mailto:and...@ni...>> wrote: Hi David, I have two questions about ORSS. I'm using ORSS in USeq 8.2.0. I was checking on a few genes (randomly chosen) that had no read counts and noticed that there are some reads for those exons of those genes. Is there a threshold number of reads that must be passed before the counts are reported? (e.g., 5 in at least one dataset?) Is there a way to get ORSS to report all of the read counts regardless of threshold (and only skip the differential testing if below a threshold)? (e.g., Dnajb7, Apol6, Dnahc5). What I'm more concerned about is a few genes that I noticed with large differences between replicates. ORSS reports high counts, but when I look at the raw reads, there is actually no expression for these genes at all. See the screenshots for Cav3 and Oxtr. In the attached spreadsheet, you can see that the read counts for the genes are 572 and 574, respectively for T1 (aka KO1 bam file in IGB). But there are no reads for these exons in the KO1 bam file. I repeated with just a single replicate and I got the same result (KO1 as treatment and WT1 as control). If you look at the wide shot of Cav3, you see that there are some spliced reads that flank these genes. I think these are being counted, even though their aligned bases are not overlapping any exons; there are about 500-600 reads that are hidden that probably have the same profile, which is probably why the counts are ~572/4. I also just tried this with USeq 8.2.2 and it gives the same result. Thanks, Andrew Andrew Oler, Ph.D. High-Throughput Sequencing Bioinformatics Specialist Contractor Medical Science & Computing, Inc. Computational Biology Section Bioinformatics and Computational Biosciences Branch (BCBB) OCICB/OSMO/OD/NIAID/NIH 31 Center Drive, Room 3B62 Bethesda, MD 20892 Mobile: 240-507-3791 Office: 301-402-5685 http://bioinformatics.niaid.nih.gov<http://bioinformatics.niaid.nih.gov/> (Within NIH) http://exon.niaid.nih.gov<http://exon.niaid.nih.gov/> (Public) Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives. |