Thread: [Useq-users] Error with USeq SamTranscriptomeParser while processing Novoalign RNA-seq outputs

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Ahh, looks like you've joined your gene name using a : .  Use an _ .  The
STP uses the : to split the splice junction chromosome name into it's
component parts.  A good junction should look like

ENSDARG00000087418:chr20:6691-6707_9356-9386_9436-9463_9494-9513

Rps3:ENSRNOT00000023935:1:156811472-156811541.... should be
Rps3_ENSRNOT00000023935:1:156811472-156811541......

As such STP isn't able to recognize the alignment as needing conversion to
genomic coordinates.

Also, it would be a good idea to rename your chromosomes to the standard
UCSC nomenclature: chr1, chr2, chr3....  I've no idea why NCBI and others
switched a couple years back.

Yes, all splice junction header lines are stripped from the SAM header, they
aren't needed after genomic coordinate conversion.

-cheers, D

From:  Jon Manning <Jon...@ed...>
Date:  Thu, 12 Apr 2012 10:18:32 +0100
To:  <use...@li...>
Subject:  [Useq-users] Error with USeq SamTranscriptomeParser while
processing Novoalign RNA-seq outputs

 Hello,

 I've been working through the Novoalign RNA-seq instructions
<http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis%3A+mRNA+a
nd+the+Spliceosome&structure=Novocraft+Technologies&page_ref_id=35> , and am
stuck at the last stage, where reads are converted back to genomic
coordinates with USeq SamTranscriptomeParser, and I'm hoping you may be able
to help. 

 When it gets to the 'Adding SAM header, sorting, and writing bam output
with Picard's SortSam...' stage I'm getting errors like:

 Exception in thread "main" net.sf.samtools.SAMFormatException: Error
parsing text SAM file. RNAME
'Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500
-156812688_156814773-156814868_156815362-156815456_156815668-156815799_15681
6728-156816770'  not found in any SQ record; Line 27
 Line: EBRI093151:81:FC:1:1:3202:1108 133
Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500-
156812688_156814773-156814868_156815362-156815456_156815668-156815799_156816
728-156816770  375 0 * = 375 0
AANAAGTGGCCACAANNNNNNNNNGNGCCATNGCCCAGNNNNNNNCTCNACGCNACAAACNCTNAGGAGGGCTTGC
AG  
B=#==A>ABCCBBAB#############################################################
##  PG:Z:novoalign ZS:Z:QC

 I've checked, and these lines ARE present in the input SAM file (made by
Novoalign), but not in the temporary SAM files I can see created by
SamTranscriptomeParser, so I suspect they may be lost somehow.

 I'm not sure how to go about debugging this myself, so all pointers
appreciated.

 Thanks,

 Jon Manning

The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.
----------------------------------------------------------------------------
-- For Developers, A Lot Can Happen In A Second. Boundary is the first to
Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try
it FREE! 
http://p.sf.net/sfu/Boundary-d2dvs2_________________________________________
______ Useq-users mailing list Use...@li...
https://lists.sourceforge.net/lists/listinfo/useq-users