From: David N. <dav...@gm...> - 2012-04-12 11:46:03
|
Did you by chance add the transcripts to your genome index from the MakeTranscriptome App? These take the form of ENSDARG00000012493:ENSDART00000126849:chr20:705345-705376_708... That also could be the problem. -cheers, D Ahh, looks like you've joined your gene name using a : . Use an _ . The STP uses the : to split the splice junction chromosome name into it's component parts. A good junction should look like ENSDARG00000087418:chr20:6691-6707_9356-9386_9436-9463_9494-9513 Rps3:ENSRNOT00000023935:1:156811472-156811541.... should be Rps3_ENSRNOT00000023935:1:156811472-156811541...... As such STP isn't able to recognize the alignment as needing conversion to genomic coordinates. Also, it would be a good idea to rename your chromosomes to the standard UCSC nomenclature: chr1, chr2, chr3.... I've no idea why NCBI and others switched a couple years back. Yes, all splice junction header lines are stripped from the SAM header, they aren't needed after genomic coordinate conversion. -cheers, D From: Jon Manning <Jon...@ed...> Date: Thu, 12 Apr 2012 10:18:32 +0100 To: <use...@li...> Subject: [Useq-users] Error with USeq SamTranscriptomeParser while processing Novoalign RNA-seq outputs Hello, I've been working through the Novoalign RNA-seq instructions <http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis%3A+mRNA+a nd+the+Spliceosome&structure=Novocraft+Technologies&page_ref_id=35> , and am stuck at the last stage, where reads are converted back to genomic coordinates with USeq SamTranscriptomeParser, and I'm hoping you may be able to help. When it gets to the 'Adding SAM header, sorting, and writing bam output with Picard's SortSam...' stage I'm getting errors like: Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. RNAME 'Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500 -156812688_156814773-156814868_156815362-156815456_156815668-156815799_15681 6728-156816770' not found in any SQ record; Line 27 Line: EBRI093151:81:FC:1:1:3202:1108 133 Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500- 156812688_156814773-156814868_156815362-156815456_156815668-156815799_156816 728-156816770 375 0 * = 375 0 AANAAGTGGCCACAANNNNNNNNNGNGCCATNGCCCAGNNNNNNNCTCNACGCNACAAACNCTNAGGAGGGCTTGC AG B=#==A>ABCCBBAB############################################################# ## PG:Z:novoalign ZS:Z:QC I've checked, and these lines ARE present in the input SAM file (made by Novoalign), but not in the temporary SAM files I can see created by SamTranscriptomeParser, so I suspect they may be lost somehow. I'm not sure how to go about debugging this myself, so all pointers appreciated. Thanks, Jon Manning The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ---------------------------------------------------------------------------- -- For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2_________________________________________ ______ Useq-users mailing list Use...@li...https://lists.sourceforge.net/lists/listinfo /useq-users |