Thread: Re: [Useq-users] Error with USeq SamTranscriptomeParser while processing Novoalign RNA-seq outputs

Brought to you by: biotelerock, bmilash, elainegee, scourdy, tmosbruger

useq-users

Re: [Useq-users] Error with USeq SamTranscriptomeParser while processing Novoalign RNA-seq outputs

From: David N. <dav...@gm...> - 2012-04-12 11:46:03

Did you by chance add the transcripts to your genome index from the
MakeTranscriptome App?  These take the form of
ENSDARG00000012493:ENSDART00000126849:chr20:705345-705376_708...

That also could be the problem.  -cheers, D








Ahh, looks like you've joined your gene name using a : .  Use an _ .  The
STP uses the : to split the splice junction chromosome name into it's
component parts.  A good junction should look like

ENSDARG00000087418:chr20:6691-6707_9356-9386_9436-9463_9494-9513

Rps3:ENSRNOT00000023935:1:156811472-156811541.... should be
Rps3_ENSRNOT00000023935:1:156811472-156811541......

As such STP isn't able to recognize the alignment as needing conversion to
genomic coordinates.

Also, it would be a good idea to rename your chromosomes to the standard
UCSC nomenclature: chr1, chr2, chr3....  I've no idea why NCBI and others
switched a couple years back.

Yes, all splice junction header lines are stripped from the SAM header, they
aren't needed after genomic coordinate conversion.

-cheers, D

From:  Jon Manning <Jon...@ed...>
Date:  Thu, 12 Apr 2012 10:18:32 +0100
To:  <use...@li...>
Subject:  [Useq-users] Error with USeq SamTranscriptomeParser while
processing Novoalign RNA-seq outputs

  
 Hello,
 
 I've been working through the Novoalign RNA-seq instructions
<http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis%3A+mRNA+a
nd+the+Spliceosome&structure=Novocraft+Technologies&page_ref_id=35> , and am
stuck at the last stage, where reads are converted back to genomic
coordinates with USeq SamTranscriptomeParser, and I'm hoping you may be able
to help. 
 
 When it gets to the 'Adding SAM header, sorting, and writing bam output
with Picard's SortSam...' stage I'm getting errors like:
 
 Exception in thread "main" net.sf.samtools.SAMFormatException: Error
parsing text SAM file. RNAME
'Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500
-156812688_156814773-156814868_156815362-156815456_156815668-156815799_15681
6728-156816770'  not found in any SQ record; Line 27
 Line: EBRI093151:81:FC:1:1:3202:1108 133
Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500-
156812688_156814773-156814868_156815362-156815456_156815668-156815799_156816
728-156816770  375 0 * = 375 0
AANAAGTGGCCACAANNNNNNNNNGNGCCATNGCCCAGNNNNNNNCTCNACGCNACAAACNCTNAGGAGGGCTTGC
AG  
B=#==A>ABCCBBAB#############################################################
##  PG:Z:novoalign ZS:Z:QC
 
 I've checked, and these lines ARE present in the input SAM file (made by
Novoalign), but not in the temporary SAM files I can see created by
SamTranscriptomeParser, so I suspect they may be lost somehow.
 
 I'm not sure how to go about debugging this myself, so all pointers
appreciated.
 
 Thanks,
 
 Jon Manning
 
 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.
----------------------------------------------------------------------------
-- For Developers, A Lot Can Happen In A Second. Boundary is the first to
Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try
it FREE! 
http://p.sf.net/sfu/Boundary-d2dvs2_________________________________________
______ Useq-users mailing list
Use...@li...https://lists.sourceforge.net/lists/listinfo
/useq-users

Re: [Useq-users] Error with USeq SamTranscriptomeParser while processing Novoalign RNA-seq outputs

From: Jon M. <Jon...@ed...> - 2012-04-12 13:05:05

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Re: [Useq-users] Error with USeq SamTranscriptomeParser while processing Novoalign RNA-seq outputs

From: David N. <dav...@gm...> - 2012-04-12 13:13:14

Yes that's incorrect.  Don't add the xxxTranscripts.fasta.  All of the
splice junctions are in the xxxSplices.fasta file.  I'll cc Colin here to
correct this in the Novocraft docs.  See also
http://useq.sourceforge.net/usageRNASeq.html

Not sure about the chr1 vs 1 .   Off the top of my head I don't think there
should be a problem with USeq apps.  But then again we haven't tested them.
Most of the genome browsers will probably complain unless you register a
synonyms table.  Sounds like the ensembl browser wont though so maybe it
isn't an issue.

-cheers, D

From:  Jon Manning <Jon...@ed...>
Date:  Thu, 12 Apr 2012 14:04:45 +0100
To:  David Nix <dav...@gm...>
Cc:  <use...@li...>
Subject:  Re: [Useq-users] Error with USeq SamTranscriptomeParser while
processing Novoalign RNA-seq outputs

    
 Hi David,
 
 Thanks for the quick reply. Following the Novoalign folks' instructions the
transcripts were indeed added to the index. Excerpt from their docs:
  
novoindex Transcriptome.nix geneMaskedGenome.fasta
refFlatRad45Num60kMin10Splices.fasta
refFlatRad45Num60kMin10Transcripts.fasta
 Is that not the right thing to do? Should it just be the genome and the
splices?
 
 I'm working primarily with Ensembl data so I'd like to keep my chromosomes
'sans chr' - unless of course the USeq apps require it?
 
 Thanks,
 
 Jon
 
 
 
 On 12/04/2012 12:45, David Nix wrote:
>  
> Did you by chance add the transcripts to your genome index from the
> MakeTranscriptome App?  These take the form of
> ENSDARG00000012493:ENSDART00000126849:chr20:705345-705376_708...
>  
> 
>  
>  
> That also could be the problem.  -cheers, D
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> Ahh, looks like you've joined your gene name using a : .  Use an _ .  The STP
> uses the : to split the splice junction chromosome name into it's component
> parts.  A good junction should look like
>  
> 
>  
>  
> ENSDARG00000087418:chr20:6691-6707_9356-9386_9436-9463_9494-9513
>  
> 
>  
>  
> Rps3:ENSRNOT00000023935:1:156811472-156811541.... should be
> Rps3_ENSRNOT00000023935:1:156811472-156811541......
>  
> 
>  
>  
> As such STP isn't able to recognize the alignment as needing conversion to
> genomic coordinates.
>  
> 
>  
>  
> Also, it would be a good idea to rename your chromosomes to the standard UCSC
> nomenclature: chr1, chr2, chr3....  I've no idea why NCBI and others switched
> a couple years back.
>  
> 
>  
>  
> Yes, all splice junction header lines are stripped from the SAM header, they
> aren't needed after genomic coordinate conversion.
>  
> 
>  
>  
> -cheers, D
>  
> 
>  
>   
> From:  Jon Manning <Jon...@ed...>
>  Date:  Thu, 12 Apr 2012 10:18:32 +0100
>  To:  <use...@li...>
>  Subject:  [Useq-users] Error with USeq SamTranscriptomeParser while
> processing Novoalign RNA-seq outputs
>  
>  
> 
>  
>  
>   
>  Hello,
>  
>  I've been working through the Novoalign RNA-seq instructions
> <http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis%3A+mRNA+and
> +the+Spliceosome&structure=Novocraft+Technologies&page_ref_id=35> , and am
> stuck at the last stage, where reads are converted back to genomic coordinates
> with USeq SamTranscriptomeParser, and I'm hoping you may be able to help.
>  
>  When it gets to the 'Adding SAM header, sorting, and writing bam output with
> Picard's SortSam...' stage I'm getting errors like:
>  
>  Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing
> text SAM file. RNAME
> 'Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500-1
> 56812688_156814773-156814868_156815362-156815456_156815668-156815799_156816728
> -156816770'  not found in any SQ record; Line 27
>  Line: EBRI093151:81:FC:1:1:3202:1108 133
> Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500-15
> 6812688_156814773-156814868_156815362-156815456_156815668-156815799_156816728-
> 156816770  375 0 * = 375 0
> AANAAGTGGCCACAANNNNNNNNNGNGCCATNGCCCAGNNNNNNNCTCNACGCNACAAACNCTNAGGAGGGCTTGCAG
> B=#==A>ABCCBBAB###############################################################
> PG:Z:novoalign ZS:Z:QC
>  
>  I've checked, and these lines ARE present in the input SAM file (made by
> Novoalign), but not in the temporary SAM files I can see created by
> SamTranscriptomeParser, so I suspect they may be lost somehow.
>  
>  I'm not sure how to go about debugging this myself, so all pointers
> appreciated.
>  
>  Thanks,
>  
>  Jon Manning
>  
>  
>  
>  The University of Edinburgh is a charitable body, registered in Scotland,
> with registration number SC005336.
> ------------------------------------------------------------------------------
> For Developers, A Lot Can Happen In A Second. Boundary is the first to
> Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try
> it FREE! 
> http://p.sf.net/sfu/Boundary-d2dvs2___________________________________________
> ____ Useq-users mailing list
> Use...@li...https://lists.sourceforge.net/lists/listinfo/u
> seq-users 
 
 
-- 
Dr Jonathan Manning
Bioinformatics Team
Centre for Cardiovascular Science
University of Edinburgh
Queens Medical Research Institute
47 Little France Crescent
Edinburgh  EH16 4TJ
United Kingdom
T: +44 131 242 6700
F: +44 131 242 6782
E: jma...@st...
 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.

Re: [Useq-users] Error with USeq SamTranscriptomeParser while processing Novoalign RNA-seq outputs

From: Jon M. <Jon...@ed...> - 2012-04-12 13:43:25

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Re: [Useq-users] Error with USeq SamTranscriptomeParser while processing Novoalign RNA-seq outputs

From: David N. <dav...@gm...> - 2012-04-12 13:49:52

Hmm.  That error you are seeing is from Picard.  STP calls SortSam
internally.  Looks like it is trying to write a short that is too big,
possibly due to the huge chromosome name? Or too many chromosome names since
these have not been converted to genomic space.

Use of the -u option won't change much of anything except redirect the
failed alignment to a file.

The big problem is you're going to have transcript alignments intermingled
with your genomic alignments and won't be able to map the former to the
latter. 

I don't think you can use your partially converted sam file.  Need to
rebuild the novoindex and realign.

-cheers, D

From:  Jon Manning <Jon...@ed...>
Date:  Thu, 12 Apr 2012 14:43:11 +0100
To:  David Nix <dav...@gm...>
Cc:  <use...@li...>
Subject:  Re: [Useq-users] Error with USeq SamTranscriptomeParser while
processing Novoalign RNA-seq outputs

    
 Okay, that's good to know- thanks.
 
 In the meantime I tried a fix suggested by Zayed at Novocraft, namely to
not use '-u' and thereby to exclude unmapped reads. Both this and using USeq
8.2.2 (I was on 8.2.1) changed the error to:
 
 Exception in thread "main" java.lang.IllegalArgumentException: Value
(70699) to large to be written as ushort.
     at net.sf.samtools.util.BinaryCodec.writeUShort(BinaryCodec.java:324)
     at net.sf.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:114)
     at net.sf.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:37)
     at 
net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:21
0)
     at 
net.sf.samtools.util.SortingCollection.add(SortingCollection.java:150)
     at 
net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:157)
     at net.sf.picard.sam.SortSam.doWork(SortSam.java:67)
     at 
net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.jav
a:175)
     at edu.utah.seq.data.sam.PicardSortSam.<init>(PicardSortSam.java:81)
     at 
edu.utah.seq.parsers.SamTranscriptomeParser.addHeaderAndSort(SamTranscriptom
eParser.java:482)
     at 
edu.utah.seq.parsers.SamTranscriptomeParser.doWork(SamTranscriptomeParser.ja
va:101)
     at 
edu.utah.seq.parsers.SamTranscriptomeParser.<init>(SamTranscriptomeParser.ja
va:55)
     at 
edu.utah.seq.parsers.SamTranscriptomeParser.main(SamTranscriptomeParser.java
:495)
 
 I realise I'm working with a bad SAM file from your point of view, but do
you think this error is part of the same thing, or something new?
 
 Jon
 
 
 On 12/04/2012 14:12, David Nix wrote:
>  
> Yes that's incorrect.  Don't add the xxxTranscripts.fasta.  All of the splice
> junctions are in the xxxSplices.fasta file.  I'll cc Colin here to correct
> this in the Novocraft docs.  See also
> http://useq.sourceforge.net/usageRNASeq.html
>  
> 
>  
>  
> Not sure about the chr1 vs 1 .   Off the top of my head I don't think there
> should be a problem with USeq apps.  But then again we haven't tested them.
> Most of the genome browsers will probably complain unless you register a
> synonyms table.  Sounds like the ensembl browser wont though so maybe it isn't
> an issue.
>  
> 
>  
>  
> -cheers, D
>  
> 
>  
>   
> From:  Jon Manning <Jon...@ed...>
>  Date:  Thu, 12 Apr 2012 14:04:45 +0100
>  To:  David Nix <dav...@gm...>
>  Cc:  <use...@li...>
>  Subject:  Re: [Useq-users] Error with USeq SamTranscriptomeParser while
> processing Novoalign RNA-seq outputs
>  
>  
> 
>  
>  
>   
>  Hi David,
>  
>  Thanks for the quick reply. Following the Novoalign folks' instructions the
> transcripts were indeed added to the index. Excerpt from their docs:
>   
> novoindex Transcriptome.nix geneMaskedGenome.fasta
> refFlatRad45Num60kMin10Splices.fasta  refFlatRad45Num60kMin10Transcripts.fasta
>  Is that not the right thing to do? Should it just be the genome and the
> splices?
>  
>  I'm working primarily with Ensembl data so I'd like to keep my chromosomes
> 'sans chr' - unless of course the USeq apps require it?
>  
>  Thanks,
>  
>  Jon
>  
>  
>  
>  On 12/04/2012 12:45, David Nix wrote:
>>  
>> Did you by chance add the transcripts to your genome index from the
>> MakeTranscriptome App?  These take the form of
>> ENSDARG00000012493:ENSDART00000126849:chr20:705345-705376_708...
>>  
>> 
>>  
>>  
>> That also could be the problem.  -cheers, D
>>  
>> 
>>  
>>  
>> 
>>  
>>  
>> 
>>  
>>  
>> 
>>  
>>  
>> 
>>  
>>  
>> 
>>  
>>  
>> 
>>  
>>  
>> 
>>  
>>  
>> Ahh, looks like you've joined your gene name using a : .  Use an _ .  The STP
>> uses the : to split the splice junction chromosome name into it's component
>> parts.  A good junction should look like
>>  
>> 
>>  
>>  
>> ENSDARG00000087418:chr20:6691-6707_9356-9386_9436-9463_9494-9513
>>  
>> 
>>  
>>  
>> Rps3:ENSRNOT00000023935:1:156811472-156811541....  should be
>> Rps3_ENSRNOT00000023935:1:156811472-156811541......
>>  
>> 
>>  
>>  
>> As such STP isn't able to recognize the alignment as needing conversion to
>> genomic coordinates.
>>  
>> 
>>  
>>  
>> Also, it would be a good idea to rename your chromosomes to the standard UCSC
>> nomenclature: chr1, chr2, chr3....  I've no idea why NCBI and others switched
>> a couple years back.
>>  
>> 
>>  
>>  
>> Yes, all splice junction header lines are stripped from the SAM header, they
>> aren't needed after genomic coordinate conversion.
>>  
>> 
>>  
>>  
>> -cheers, D
>>  
>> 
>>  
>>   
>> From:  Jon Manning <Jon...@ed...>
>>  Date:  Thu, 12 Apr 2012 10:18:32 +0100
>>  To:  <use...@li...>
>>  Subject:  [Useq-users] Error with USeq SamTranscriptomeParser while
>> processing Novoalign RNA-seq outputs
>>  
>>  
>> 
>>  
>>  
>>   
>>  Hello,
>>  
>>  I've been working through the Novoalign RNA-seq instructions
>> <http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis%3A+mRNA+an
>> d+the+Spliceosome&structure=Novocraft+Technologies&page_ref_id=35> , and am
>> stuck at the last stage, where reads are converted back to genomic
>> coordinates with USeq SamTranscriptomeParser, and I'm hoping you may be able
>> to help. 
>>  
>>  When it gets to the 'Adding SAM header, sorting, and writing bam output with
>> Picard's SortSam...' stage I'm getting errors like:
>>  
>>  Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing
>> text SAM file. RNAME
>> 'Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500-
>> 156812688_156814773-156814868_156815362-156815456_156815668-156815799_1568167
>> 28-156816770'  not found in any SQ record; Line 27
>>  Line: EBRI093151:81:FC:1:1:3202:1108 133
>> Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500-1
>> 56812688_156814773-156814868_156815362-156815456_156815668-156815799_15681672
>> 8-156816770  375 0 * = 375 0
>> AANAAGTGGCCACAANNNNNNNNNGNGCCATNGCCCAGNNNNNNNCTCNACGCNACAAACNCTNAGGAGGGCTTGCA
>> G  
>> B=#==A>ABCCBBAB##############################################################
>> #  PG:Z:novoalign ZS:Z:QC
>>  
>>  I've checked, and these lines ARE present in the input SAM file (made by
>> Novoalign), but not in the temporary SAM files I can see created by
>> SamTranscriptomeParser, so I suspect they may be lost somehow.
>>  
>>  I'm not sure how to go about debugging this myself, so all pointers
>> appreciated.
>>  
>>  Thanks,
>>  
>>  Jon Manning
>>  
>>  
>>  
>>  The University of Edinburgh is a charitable body, registered in Scotland,
>> with registration number SC005336.
>> -----------------------------------------------------------------------------
>> - For  Developers, A Lot Can Happen In A Second. Boundary is the first to
>> Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try
>> it FREE! 
>> http://p.sf.net/sfu/Boundary-d2dvs2__________________________________________
>> _____ Useq-users mailing list
>> Use...@li...https://lists.sourceforge.net/lists/listinfo/
>> useq-users 
>  
>  
> -- 
> Dr Jonathan Manning
> Bioinformatics Team
> Centre for Cardiovascular Science
> University of Edinburgh
> Queens Medical Research Institute
> 47 Little France Crescent
> Edinburgh  EH16 4TJ
> United Kingdom
> T: +44 131 242 6700
> F: +44 131 242 6782
> E: jma...@st...
>  
>  
>  The University of Edinburgh is a charitable body, registered in Scotland,
> with registration number SC005336.
 
 
-- 
Dr Jonathan Manning
Bioinformatics Team
Centre for Cardiovascular Science
University of Edinburgh
Queens Medical Research Institute
47 Little France Crescent
Edinburgh  EH16 4TJ
United Kingdom
T: +44 131 242 6700
F: +44 131 242 6782
E: jma...@st...
 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.

Re: [Useq-users] Error with USeq SamTranscriptomeParser while processing Novoalign RNA-seq outputs

From: Jon M. <Jon...@ed...> - 2012-04-12 14:15:30

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Re: [Useq-users] Error with USeq SamTranscriptomeParser while processing Novoalign RNA-seq outputs

From: David N. <dav...@gm...> - 2012-04-12 14:16:57

Yes, if you save it as a sam it bypasses Picard's SortSam and just writes
out the alignments.  -cheers, D

From:  Jon Manning <Jon...@ed...>
Date:  Thu, 12 Apr 2012 15:15:10 +0100
To:  David Nix <dav...@gm...>
Cc:  <use...@li...>
Subject:  Re: [Useq-users] Error with USeq SamTranscriptomeParser while
processing Novoalign RNA-seq outputs

    
 Thanks for the pointers- don't worrry, I will be re-running the alignment.
 
 However, specifying '-s output.sam' did at least make things run without
error- Zayed indicated that the BAM conversion was the problem, due to the
'absence of a valid sequence dictionary'.
 
 But things are much clearer now than they were this morning- thank you.
 
 Jon
 
 
 
 On 12/04/2012 14:49, David Nix wrote:
>  
> Hmm.  That error you are seeing is from Picard.  STP calls SortSam internally.
> Looks like it is trying to write a short that is too big, possibly due to the
> huge chromosome name? Or too many chromosome names since these have not been
> converted to genomic space.
>  
> 
>  
>  
> Use of the -u option won't change much of anything except redirect the failed
> alignment to a file.
>  
> 
>  
>  
> The big problem is you're going to have transcript alignments intermingled
> with your genomic alignments and won't be able to map the former to the
> latter. 
>  
> 
>  
>  
> I don't think you can use your partially converted sam file.  Need to rebuild
> the novoindex and realign.
>  
> 
>  
>  
> -cheers, D
>  
> 
>  
>   
> From:  Jon Manning <Jon...@ed...>
>  Date:  Thu, 12 Apr 2012 14:43:11 +0100
>  To:  David Nix <dav...@gm...>
>  Cc:  <use...@li...>
>  Subject:  Re: [Useq-users] Error with USeq SamTranscriptomeParser while
> processing Novoalign RNA-seq outputs
>  
>  
> 
>  
>  
>   
>  Okay, that's good to know- thanks.
>  
>  In the meantime I tried a fix suggested by Zayed at Novocraft, namely to not
> use '-u' and thereby to exclude unmapped reads. Both this and using USeq 8.2.2
> (I was on 8.2.1) changed the error to:
>  
>  Exception in thread "main" java.lang.IllegalArgumentException: Value (70699)
> to large to be written as ushort.
>      at net.sf.samtools.util.BinaryCodec.writeUShort(BinaryCodec.java:324)
>      at net.sf.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:114)
>      at net.sf.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:37)
>      at 
> net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:210)
>      at net.sf.samtools.util.SortingCollection.add(SortingCollection.java:150)
>      at 
> net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:157)
>      at net.sf.picard.sam.SortSam.doWork(SortSam.java:67)
>      at 
> net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:
> 175)
>      at edu.utah.seq.data.sam.PicardSortSam.<init>(PicardSortSam.java:81)
>      at 
> edu.utah.seq.parsers.SamTranscriptomeParser.addHeaderAndSort(SamTranscriptomeP
> arser.java:482)
>      at 
> edu.utah.seq.parsers.SamTranscriptomeParser.doWork(SamTranscriptomeParser.java
> :101)
>      at 
> edu.utah.seq.parsers.SamTranscriptomeParser.<init>(SamTranscriptomeParser.java
> :55)
>      at 
> edu.utah.seq.parsers.SamTranscriptomeParser.main(SamTranscriptomeParser.java:4
> 95)
>  
>  I realise I'm working with a bad SAM file from your point of view, but do you
> think this error is part of the same thing, or something new?
>  
>  Jon
>  
>  
>  On 12/04/2012 14:12, David Nix wrote:
>>  
>> Yes that's incorrect.  Don't add the xxxTranscripts.fasta.  All of the splice
>> junctions are in the xxxSplices.fasta file.  I'll cc Colin here to correct
>> this in the Novocraft docs.  See also
>> http://useq.sourceforge.net/usageRNASeq.html
>>  
>> 
>>  
>>  
>> Not sure about the chr1 vs 1 .   Off the top of my head I don't think there
>> should be a problem with USeq apps.  But then again we haven't tested them.
>> Most of the genome browsers will probably complain unless you register a
>> synonyms table.  Sounds like the ensembl browser wont though so maybe it
>> isn't an issue.
>>  
>> 
>>  
>>  
>> -cheers, D
>>  
>> 
>>  
>>   
>> From:  Jon Manning <Jon...@ed...>
>>  Date:  Thu, 12 Apr 2012 14:04:45 +0100
>>  To:  David Nix <dav...@gm...>
>>  Cc:  <use...@li...>
>>  Subject:  Re: [Useq-users] Error with USeq SamTranscriptomeParser while
>> processing Novoalign RNA-seq outputs
>>  
>>  
>> 
>>  
>>  
>>   
>>  Hi David,
>>  
>>  Thanks for the quick reply. Following the Novoalign folks' instructions the
>> transcripts were indeed added to the index. Excerpt from their docs:
>>   
>> novoindex Transcriptome.nix geneMaskedGenome.fasta
>> refFlatRad45Num60kMin10Splices.fasta
>> refFlatRad45Num60kMin10Transcripts.fasta
>>  Is that not the right thing to do? Should it just be the genome and the
>> splices?
>>  
>>  I'm working primarily with Ensembl data so I'd like to keep my chromosomes
>> 'sans chr' - unless of course the USeq apps require it?
>>  
>>  Thanks,
>>  
>>  Jon
>>  
>>  
>>  
>>  On 12/04/2012 12:45, David Nix wrote:
>>>  
>>> Did you by chance add the transcripts to your genome index from the
>>> MakeTranscriptome App?  These take the form of
>>> ENSDARG00000012493:ENSDART00000126849:chr20:705345-705376_708...
>>>  
>>> 
>>>  
>>>  
>>> That also could be the problem.  -cheers, D
>>>  
>>> 
>>>  
>>>  
>>> 
>>>  
>>>  
>>> 
>>>  
>>>  
>>> 
>>>  
>>>  
>>> 
>>>  
>>>  
>>> 
>>>  
>>>  
>>> 
>>>  
>>>  
>>> 
>>>  
>>>  
>>> Ahh, looks like you've joined your gene name using a : .  Use an _ .  The
>>> STP uses the : to split the splice junction chromosome name into it's
>>> component parts.  A good junction should look like
>>>  
>>> 
>>>  
>>>  
>>> ENSDARG00000087418:chr20:6691-6707_9356-9386_9436-9463_9494-9513
>>>  
>>> 
>>>  
>>>  
>>> Rps3:ENSRNOT00000023935:1:156811472-156811541....  should be
>>> Rps3_ENSRNOT00000023935:1:156811472-156811541......
>>>  
>>> 
>>>  
>>>  
>>> As such STP isn't able to recognize the alignment as needing conversion to
>>> genomic coordinates.
>>>  
>>> 
>>>  
>>>  
>>> Also, it would be a good idea to rename your chromosomes to the standard
>>> UCSC nomenclature: chr1, chr2, chr3....  I've no idea why NCBI and others
>>> switched a couple years back.
>>>  
>>> 
>>>  
>>>  
>>> Yes, all splice junction header lines are stripped from the SAM header, they
>>> aren't needed after genomic coordinate conversion.
>>>  
>>> 
>>>  
>>>  
>>> -cheers, D
>>>  
>>> 
>>>  
>>>   
>>> From:  Jon Manning <Jon...@ed...>
>>>  Date:  Thu, 12 Apr 2012 10:18:32 +0100
>>>  To:  <use...@li...>
>>>  Subject:  [Useq-users] Error with USeq SamTranscriptomeParser while
>>> processing Novoalign RNA-seq outputs
>>>  
>>>  
>>> 
>>>  
>>>  
>>>   
>>>  Hello,
>>>  
>>>  I've been working through the Novoalign RNA-seq instructions
>>> <http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis%3A+mRNA+a
>>> nd+the+Spliceosome&structure=Novocraft+Technologies&page_ref_id=35> , and am
>>> stuck at the last stage, where reads are converted back to genomic
>>> coordinates with USeq SamTranscriptomeParser, and I'm hoping you may be able
>>> to help. 
>>>  
>>>  When it gets to the 'Adding SAM header, sorting, and writing bam output
>>> with Picard's SortSam...' stage I'm getting errors like:
>>>  
>>>  Exception in thread "main" net.sf.samtools.SAMFormatException: Error
>>> parsing text SAM file. RNAME
>>> 'Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500
>>> -156812688_156814773-156814868_156815362-156815456_156815668-156815799_15681
>>> 6728-156816770'  not found in any SQ record; Line 27
>>>  Line: EBRI093151:81:FC:1:1:3202:1108 133
>>> Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500-
>>> 156812688_156814773-156814868_156815362-156815456_156815668-156815799_156816
>>> 728-156816770  375 0 * = 375 0
>>> AANAAGTGGCCACAANNNNNNNNNGNGCCATNGCCCAGNNNNNNNCTCNACGCNACAAACNCTNAGGAGGGCTTGC
>>> AG  
>>> B=#==A>ABCCBBAB#############################################################
>>> ##  PG:Z:novoalign ZS:Z:QC
>>>  
>>>  I've checked, and these lines ARE present in the input SAM file (made by
>>> Novoalign), but not in the temporary SAM files I can see created by
>>> SamTranscriptomeParser, so I suspect they may be lost somehow.
>>>  
>>>  I'm not sure how to go about debugging this myself, so all pointers
>>> appreciated.
>>>  
>>>  Thanks,
>>>  
>>>  Jon Manning
>>>  
>>>  
>>>  
>>>  The University of Edinburgh is a charitable body, registered in Scotland,
>>> with registration number SC005336.
>>> ----------------------------------------------------------------------------
>>> -- For  Developers, A Lot Can Happen In A Second. Boundary is the first to
>>> Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try
>>> it FREE! 
>>> http://p.sf.net/sfu/Boundary-d2dvs2_________________________________________
>>> ______ Useq-users mailing list
>>> Use...@li...https://lists.sourceforge.net/lists/listinfo
>>> /useq-users 
>>  
>>  
>> -- 
>> Dr Jonathan Manning
>> Bioinformatics Team
>> Centre for Cardiovascular Science
>> University of Edinburgh
>> Queens Medical Research Institute
>> 47 Little France Crescent
>> Edinburgh  EH16 4TJ
>> United Kingdom
>> T: +44 131 242 6700
>> F: +44 131 242 6782
>> E: jma...@st...
>>  
>>  
>>  The University of Edinburgh is a charitable body, registered in Scotland,
>> with registration number SC005336.
>  
>  
> -- 
> Dr Jonathan Manning
> Bioinformatics Team
> Centre for Cardiovascular Science
> University of Edinburgh
> Queens Medical Research Institute
> 47 Little France Crescent
> Edinburgh  EH16 4TJ
> United Kingdom
> T: +44 131 242 6700
> F: +44 131 242 6782
> E: jma...@st...
>  
>  
>  The University of Edinburgh is a charitable body, registered in Scotland,
> with registration number SC005336.
 
 
-- 
Dr Jonathan Manning
Bioinformatics Team
Centre for Cardiovascular Science
University of Edinburgh
Queens Medical Research Institute
47 Little France Crescent
Edinburgh  EH16 4TJ
United Kingdom
T: +44 131 242 6700
F: +44 131 242 6782
E: jma...@st...
 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.