From: David N. <Dav...@hc...> - 2012-10-26 15:11:29
|
Hello Guorong, Yes, looks like an issue. Would you mind grepping your unprocessed alignment file for those read names so I can see what it started out as. -cheers, D P.S. We don't use HTSeq so it's not an issue. The USeq apps hash on the read name after collecting intersecting reads for each exon so paired data is collapsed if present. I wonder how HTSeq is doing it? As a short term workaround, use the USeq DefinedRegionDifferentialSeq with the -t option and it will generate a table of hit counts for each gene and each sample. http://useq.sourceforge.net/cmdLnMenus.html#DefinedRegionDifferentialSeq From: Guorong Xu <guo...@gm...<mailto:guo...@gm...>> Date: Wed, 24 Oct 2012 13:18:51 -0500 To: David Nix <dav...@hc...<mailto:dav...@hc...>> Subject: USeq issue Hi David, I found an issue from Useq SamTranscriptomParser for paired-end dataset. Please see the below two fragments (4 reads). The mate positions of the first fragment are not correct, but the mate positions of the second fragment are correct. When I looked into the original alignment file by novoalign, I found the first fragment was mapped on a junction region (junction library). The mate position of these two reads are not changed when Useq converts the artificial chrom name to genomic chrom name. That’s why the mate positions of the first fragment are very small (457 and 345). Now, we cannot use HTSeq to calculate the counts information due to this incorrect format (mate position issue). Do you have any suggestions on that? Thanks, Guorong HWI-ST1189:59:C1305ACXX:5:1101:10001:63267 99 chr21 48064279 0 51M = 457 0 GGGGTGAGCGTGCGGGCTGCTGTGGGTAC ATTCCGGCAAACCATGTGGGGA BCCFDFFFHHHHHJJJJJJJJJGIIJFHIJJJJJJJJJHHHFFFFFFEDDD PG:Z:novoalignMPI NH:i:12 HI:i:1 AM:i:70 NM:i:0 SM:i:70 GN:Z: 951 TN:Z:ENST00000440086 ZN:i:12 PQ:i:2 UQ:i:2 AS:i:2 ZS:Z:R HWI-ST1189:59:C1305ACXX:5:1101:10001:63267 147 chr21 48064391 0 10M3969N41M = 345 0 TGGAACTCTGAAACTCCACTT GGAGATGTTGGCAGACCAGCCACGAACAAC EHGHDJIJGGHH9GFHD:BIGIHGGGGFHGIHGEFBJJHHHHHFFFFFCCC PG:Z:novoalignMPI NH:i:12 HI:i:1 AM:i:70 NM:i:0 SM:i: 70 GN:Z:951 TN:Z:ENST00000440086 ZN:i:12 PQ:i:2 UQ:i:0 AS:i:0 XS:A:- ZS:Z:R HWI-ST1189:59:C1305ACXX:5:1101:10001:74618 99 chr3 185135345 0 51M = 185135507 0 CTGGTTTCTTGGTAGCTGCTG CCTTTTTTCCCACCAGAGGCTTCTTCTGCT @@@FBDDEFFHHFHHIIJJIIIJIIJJIIGDHIJIGGGHIFJJGGIBG@EF PG:Z:novoalignMPI NH:i:6 HI:i:1 AM:i:70 NM:i:0 SM:i: 70 ZN:i:6 PQ:i:3 UQ:i:3 AS:i:3 ZS:Z:R HWI-ST1189:59:C1305ACXX:5:1101:10001:74618 147 chr3 185135507 0 51M = 185135345 0 CCTTATCCACCCGGAGCTTGT GATTCCTGGCCTGGCGAAGAATGGTGTTCC @EGGEGIGDIHAGJIDIIIGHABHIGGDDCGGEGGFHFFFHHHDDDFDC@@ PG:Z:novoalignMPI NH:i:6 HI:i:1 AM:i:70 NM:i:0 SM:i: 70 ZN:i:6 PQ:i:3 UQ:i:0 AS:i:0 ZS:Z:R |