From: Guorong Xu <guo...@gm...> - 2012-11-12 19:57:41
|
Hi David, I found another issue on the conversion program. When I used Novoalign to align our fastq data against reference genome, I found the MD tag in the SAM record. But the MD tag was missing after I converted the artificial chromosome name to regular chromosome name. You can also found the below four SAM records, all MD tags were missing. Could you double check this issue? Thanks, Guorong On Fri, Oct 26, 2012 at 10:11 AM, David Nix <Dav...@hc...> wrote: > Hello Guorong, > > Yes, looks like an issue. Would you mind grepping your unprocessed > alignment file for those read names so I can see what it started out as. > > -cheers, D > > P.S. We don't use HTSeq so it's not an issue. The USeq apps hash on the > read name after collecting intersecting reads for each exon so paired data > is collapsed if present. I wonder how HTSeq is doing it? As a short term > workaround, use the USeq DefinedRegionDifferentialSeq with the -t option > and it will generate a table of hit counts for each gene and each sample. > http://useq.sourceforge.net/cmdLnMenus.html#DefinedRegionDifferentialSeq > > > > From: Guorong Xu <guo...@gm...<mailto:guo...@gm...>> > Date: Wed, 24 Oct 2012 13:18:51 -0500 > To: David Nix <dav...@hc...<mailto:dav...@hc...>> > Subject: USeq issue > > Hi David, > > I found an issue from Useq SamTranscriptomParser for paired-end dataset. > Please see the below two fragments (4 reads). > > The mate positions of the first fragment are not correct, but the mate > positions of the second fragment are correct. > When I looked into the original alignment file by novoalign, I found the > first fragment was mapped on a junction region (junction library). > The mate position of these two reads are not changed when Useq converts > the artificial chrom name to genomic chrom name. > That’s why the mate positions of the first fragment are very small (457 > and 345). > > Now, we cannot use HTSeq to calculate the counts information due to this > incorrect format (mate position issue). > Do you have any suggestions on that? > > Thanks, > Guorong > > HWI-ST1189:59:C1305ACXX:5:1101:10001:63267 99 chr21 48064279 0 > 51M = 457 0 GGGGTGAGCGTGCGGGCTGCTGTGGGTAC > ATTCCGGCAAACCATGTGGGGA > BCCFDFFFHHHHHJJJJJJJJJGIIJFHIJJJJJJJJJHHHFFFFFFEDDD PG:Z:novoalignMPI > NH:i:12 HI:i:1 AM:i:70 NM:i:0 SM:i:70 GN:Z: > 951 TN:Z:ENST00000440086 ZN:i:12 PQ:i:2 UQ:i:2 AS:i:2 > ZS:Z:R > HWI-ST1189:59:C1305ACXX:5:1101:10001:63267 147 chr21 48064391 > 0 10M3969N41M = 345 0 TGGAACTCTGAAACTCCACTT > GGAGATGTTGGCAGACCAGCCACGAACAAC > EHGHDJIJGGHH9GFHD:BIGIHGGGGFHGIHGEFBJJHHHHHFFFFFCCC PG:Z:novoalignMPI > NH:i:12 HI:i:1 AM:i:70 NM:i:0 SM:i: > 70 GN:Z:951 TN:Z:ENST00000440086 ZN:i:12 PQ:i:2 UQ:i:0 > AS:i:0 XS:A:- ZS:Z:R > > HWI-ST1189:59:C1305ACXX:5:1101:10001:74618 99 chr3 185135345 0 > 51M = 185135507 0 CTGGTTTCTTGGTAGCTGCTG > CCTTTTTTCCCACCAGAGGCTTCTTCTGCT > @@@FBDDEFFHHFHHIIJJIIIJIIJJIIGDHIJIGGGHIFJJGGIBG@EF PG:Z:novoalignMPI > NH:i:6 HI:i:1 AM:i:70 NM:i:0 SM:i: > 70 ZN:i:6 PQ:i:3 UQ:i:3 AS:i:3 ZS:Z:R > HWI-ST1189:59:C1305ACXX:5:1101:10001:74618 147 chr3 185135507 > 0 51M = 185135345 0 CCTTATCCACCCGGAGCTTGT > GATTCCTGGCCTGGCGAAGAATGGTGTTCC > @EGGEGIGDIHAGJIDIIIGHABHIGGDDCGGEGGFHFFFHHHDDDFDC@@ PG:Z:novoalignMPI > NH:i:6 HI:i:1 AM:i:70 NM:i:0 SM:i: > 70 ZN:i:6 PQ:i:3 UQ:i:0 AS:i:0 ZS:Z:R > |