From: David N. <dav...@gm...> - 2012-04-12 11:46:03
|
Did you by chance add the transcripts to your genome index from the MakeTranscriptome App? These take the form of ENSDARG00000012493:ENSDART00000126849:chr20:705345-705376_708... That also could be the problem. -cheers, D Ahh, looks like you've joined your gene name using a : . Use an _ . The STP uses the : to split the splice junction chromosome name into it's component parts. A good junction should look like ENSDARG00000087418:chr20:6691-6707_9356-9386_9436-9463_9494-9513 Rps3:ENSRNOT00000023935:1:156811472-156811541.... should be Rps3_ENSRNOT00000023935:1:156811472-156811541...... As such STP isn't able to recognize the alignment as needing conversion to genomic coordinates. Also, it would be a good idea to rename your chromosomes to the standard UCSC nomenclature: chr1, chr2, chr3.... I've no idea why NCBI and others switched a couple years back. Yes, all splice junction header lines are stripped from the SAM header, they aren't needed after genomic coordinate conversion. -cheers, D From: Jon Manning <Jon...@ed...> Date: Thu, 12 Apr 2012 10:18:32 +0100 To: <use...@li...> Subject: [Useq-users] Error with USeq SamTranscriptomeParser while processing Novoalign RNA-seq outputs Hello, I've been working through the Novoalign RNA-seq instructions <http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis%3A+mRNA+a nd+the+Spliceosome&structure=Novocraft+Technologies&page_ref_id=35> , and am stuck at the last stage, where reads are converted back to genomic coordinates with USeq SamTranscriptomeParser, and I'm hoping you may be able to help. When it gets to the 'Adding SAM header, sorting, and writing bam output with Picard's SortSam...' stage I'm getting errors like: Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. RNAME 'Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500 -156812688_156814773-156814868_156815362-156815456_156815668-156815799_15681 6728-156816770' not found in any SQ record; Line 27 Line: EBRI093151:81:FC:1:1:3202:1108 133 Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500- 156812688_156814773-156814868_156815362-156815456_156815668-156815799_156816 728-156816770 375 0 * = 375 0 AANAAGTGGCCACAANNNNNNNNNGNGCCATNGCCCAGNNNNNNNCTCNACGCNACAAACNCTNAGGAGGGCTTGC AG B=#==A>ABCCBBAB############################################################# ## PG:Z:novoalign ZS:Z:QC I've checked, and these lines ARE present in the input SAM file (made by Novoalign), but not in the temporary SAM files I can see created by SamTranscriptomeParser, so I suspect they may be lost somehow. I'm not sure how to go about debugging this myself, so all pointers appreciated. Thanks, Jon Manning The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ---------------------------------------------------------------------------- -- For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2_________________________________________ ______ Useq-users mailing list Use...@li...https://lists.sourceforge.net/lists/listinfo /useq-users |
From: Jon M. <Jon...@ed...> - 2012-04-12 13:05:05
|
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. |
From: David N. <dav...@gm...> - 2012-04-12 13:13:14
|
Yes that's incorrect. Don't add the xxxTranscripts.fasta. All of the splice junctions are in the xxxSplices.fasta file. I'll cc Colin here to correct this in the Novocraft docs. See also http://useq.sourceforge.net/usageRNASeq.html Not sure about the chr1 vs 1 . Off the top of my head I don't think there should be a problem with USeq apps. But then again we haven't tested them. Most of the genome browsers will probably complain unless you register a synonyms table. Sounds like the ensembl browser wont though so maybe it isn't an issue. -cheers, D From: Jon Manning <Jon...@ed...> Date: Thu, 12 Apr 2012 14:04:45 +0100 To: David Nix <dav...@gm...> Cc: <use...@li...> Subject: Re: [Useq-users] Error with USeq SamTranscriptomeParser while processing Novoalign RNA-seq outputs Hi David, Thanks for the quick reply. Following the Novoalign folks' instructions the transcripts were indeed added to the index. Excerpt from their docs: novoindex Transcriptome.nix geneMaskedGenome.fasta refFlatRad45Num60kMin10Splices.fasta refFlatRad45Num60kMin10Transcripts.fasta Is that not the right thing to do? Should it just be the genome and the splices? I'm working primarily with Ensembl data so I'd like to keep my chromosomes 'sans chr' - unless of course the USeq apps require it? Thanks, Jon On 12/04/2012 12:45, David Nix wrote: > > Did you by chance add the transcripts to your genome index from the > MakeTranscriptome App? These take the form of > ENSDARG00000012493:ENSDART00000126849:chr20:705345-705376_708... > > > > > That also could be the problem. -cheers, D > > > > > > > > > > > > > > > > > > > > > > > > > > Ahh, looks like you've joined your gene name using a : . Use an _ . The STP > uses the : to split the splice junction chromosome name into it's component > parts. A good junction should look like > > > > > ENSDARG00000087418:chr20:6691-6707_9356-9386_9436-9463_9494-9513 > > > > > Rps3:ENSRNOT00000023935:1:156811472-156811541.... should be > Rps3_ENSRNOT00000023935:1:156811472-156811541...... > > > > > As such STP isn't able to recognize the alignment as needing conversion to > genomic coordinates. > > > > > Also, it would be a good idea to rename your chromosomes to the standard UCSC > nomenclature: chr1, chr2, chr3.... I've no idea why NCBI and others switched > a couple years back. > > > > > Yes, all splice junction header lines are stripped from the SAM header, they > aren't needed after genomic coordinate conversion. > > > > > -cheers, D > > > > > From: Jon Manning <Jon...@ed...> > Date: Thu, 12 Apr 2012 10:18:32 +0100 > To: <use...@li...> > Subject: [Useq-users] Error with USeq SamTranscriptomeParser while > processing Novoalign RNA-seq outputs > > > > > > > Hello, > > I've been working through the Novoalign RNA-seq instructions > <http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis%3A+mRNA+and > +the+Spliceosome&structure=Novocraft+Technologies&page_ref_id=35> , and am > stuck at the last stage, where reads are converted back to genomic coordinates > with USeq SamTranscriptomeParser, and I'm hoping you may be able to help. > > When it gets to the 'Adding SAM header, sorting, and writing bam output with > Picard's SortSam...' stage I'm getting errors like: > > Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing > text SAM file. RNAME > 'Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500-1 > 56812688_156814773-156814868_156815362-156815456_156815668-156815799_156816728 > -156816770' not found in any SQ record; Line 27 > Line: EBRI093151:81:FC:1:1:3202:1108 133 > Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500-15 > 6812688_156814773-156814868_156815362-156815456_156815668-156815799_156816728- > 156816770 375 0 * = 375 0 > AANAAGTGGCCACAANNNNNNNNNGNGCCATNGCCCAGNNNNNNNCTCNACGCNACAAACNCTNAGGAGGGCTTGCAG > B=#==A>ABCCBBAB############################################################### > PG:Z:novoalign ZS:Z:QC > > I've checked, and these lines ARE present in the input SAM file (made by > Novoalign), but not in the temporary SAM files I can see created by > SamTranscriptomeParser, so I suspect they may be lost somehow. > > I'm not sure how to go about debugging this myself, so all pointers > appreciated. > > Thanks, > > Jon Manning > > > > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. > ------------------------------------------------------------------------------ > For Developers, A Lot Can Happen In A Second. Boundary is the first to > Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try > it FREE! > http://p.sf.net/sfu/Boundary-d2dvs2___________________________________________ > ____ Useq-users mailing list > Use...@li...https://lists.sourceforge.net/lists/listinfo/u > seq-users -- Dr Jonathan Manning Bioinformatics Team Centre for Cardiovascular Science University of Edinburgh Queens Medical Research Institute 47 Little France Crescent Edinburgh EH16 4TJ United Kingdom T: +44 131 242 6700 F: +44 131 242 6782 E: jma...@st... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. |
From: Jon M. <Jon...@ed...> - 2012-04-12 13:43:25
|
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. |
From: David N. <dav...@gm...> - 2012-04-12 13:49:52
|
Hmm. That error you are seeing is from Picard. STP calls SortSam internally. Looks like it is trying to write a short that is too big, possibly due to the huge chromosome name? Or too many chromosome names since these have not been converted to genomic space. Use of the -u option won't change much of anything except redirect the failed alignment to a file. The big problem is you're going to have transcript alignments intermingled with your genomic alignments and won't be able to map the former to the latter. I don't think you can use your partially converted sam file. Need to rebuild the novoindex and realign. -cheers, D From: Jon Manning <Jon...@ed...> Date: Thu, 12 Apr 2012 14:43:11 +0100 To: David Nix <dav...@gm...> Cc: <use...@li...> Subject: Re: [Useq-users] Error with USeq SamTranscriptomeParser while processing Novoalign RNA-seq outputs Okay, that's good to know- thanks. In the meantime I tried a fix suggested by Zayed at Novocraft, namely to not use '-u' and thereby to exclude unmapped reads. Both this and using USeq 8.2.2 (I was on 8.2.1) changed the error to: Exception in thread "main" java.lang.IllegalArgumentException: Value (70699) to large to be written as ushort. at net.sf.samtools.util.BinaryCodec.writeUShort(BinaryCodec.java:324) at net.sf.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:114) at net.sf.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:37) at net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:21 0) at net.sf.samtools.util.SortingCollection.add(SortingCollection.java:150) at net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:157) at net.sf.picard.sam.SortSam.doWork(SortSam.java:67) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.jav a:175) at edu.utah.seq.data.sam.PicardSortSam.<init>(PicardSortSam.java:81) at edu.utah.seq.parsers.SamTranscriptomeParser.addHeaderAndSort(SamTranscriptom eParser.java:482) at edu.utah.seq.parsers.SamTranscriptomeParser.doWork(SamTranscriptomeParser.ja va:101) at edu.utah.seq.parsers.SamTranscriptomeParser.<init>(SamTranscriptomeParser.ja va:55) at edu.utah.seq.parsers.SamTranscriptomeParser.main(SamTranscriptomeParser.java :495) I realise I'm working with a bad SAM file from your point of view, but do you think this error is part of the same thing, or something new? Jon On 12/04/2012 14:12, David Nix wrote: > > Yes that's incorrect. Don't add the xxxTranscripts.fasta. All of the splice > junctions are in the xxxSplices.fasta file. I'll cc Colin here to correct > this in the Novocraft docs. See also > http://useq.sourceforge.net/usageRNASeq.html > > > > > Not sure about the chr1 vs 1 . Off the top of my head I don't think there > should be a problem with USeq apps. But then again we haven't tested them. > Most of the genome browsers will probably complain unless you register a > synonyms table. Sounds like the ensembl browser wont though so maybe it isn't > an issue. > > > > > -cheers, D > > > > > From: Jon Manning <Jon...@ed...> > Date: Thu, 12 Apr 2012 14:04:45 +0100 > To: David Nix <dav...@gm...> > Cc: <use...@li...> > Subject: Re: [Useq-users] Error with USeq SamTranscriptomeParser while > processing Novoalign RNA-seq outputs > > > > > > > Hi David, > > Thanks for the quick reply. Following the Novoalign folks' instructions the > transcripts were indeed added to the index. Excerpt from their docs: > > novoindex Transcriptome.nix geneMaskedGenome.fasta > refFlatRad45Num60kMin10Splices.fasta refFlatRad45Num60kMin10Transcripts.fasta > Is that not the right thing to do? Should it just be the genome and the > splices? > > I'm working primarily with Ensembl data so I'd like to keep my chromosomes > 'sans chr' - unless of course the USeq apps require it? > > Thanks, > > Jon > > > > On 12/04/2012 12:45, David Nix wrote: >> >> Did you by chance add the transcripts to your genome index from the >> MakeTranscriptome App? These take the form of >> ENSDARG00000012493:ENSDART00000126849:chr20:705345-705376_708... >> >> >> >> >> That also could be the problem. -cheers, D >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Ahh, looks like you've joined your gene name using a : . Use an _ . The STP >> uses the : to split the splice junction chromosome name into it's component >> parts. A good junction should look like >> >> >> >> >> ENSDARG00000087418:chr20:6691-6707_9356-9386_9436-9463_9494-9513 >> >> >> >> >> Rps3:ENSRNOT00000023935:1:156811472-156811541.... should be >> Rps3_ENSRNOT00000023935:1:156811472-156811541...... >> >> >> >> >> As such STP isn't able to recognize the alignment as needing conversion to >> genomic coordinates. >> >> >> >> >> Also, it would be a good idea to rename your chromosomes to the standard UCSC >> nomenclature: chr1, chr2, chr3.... I've no idea why NCBI and others switched >> a couple years back. >> >> >> >> >> Yes, all splice junction header lines are stripped from the SAM header, they >> aren't needed after genomic coordinate conversion. >> >> >> >> >> -cheers, D >> >> >> >> >> From: Jon Manning <Jon...@ed...> >> Date: Thu, 12 Apr 2012 10:18:32 +0100 >> To: <use...@li...> >> Subject: [Useq-users] Error with USeq SamTranscriptomeParser while >> processing Novoalign RNA-seq outputs >> >> >> >> >> >> >> Hello, >> >> I've been working through the Novoalign RNA-seq instructions >> <http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis%3A+mRNA+an >> d+the+Spliceosome&structure=Novocraft+Technologies&page_ref_id=35> , and am >> stuck at the last stage, where reads are converted back to genomic >> coordinates with USeq SamTranscriptomeParser, and I'm hoping you may be able >> to help. >> >> When it gets to the 'Adding SAM header, sorting, and writing bam output with >> Picard's SortSam...' stage I'm getting errors like: >> >> Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing >> text SAM file. RNAME >> 'Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500- >> 156812688_156814773-156814868_156815362-156815456_156815668-156815799_1568167 >> 28-156816770' not found in any SQ record; Line 27 >> Line: EBRI093151:81:FC:1:1:3202:1108 133 >> Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500-1 >> 56812688_156814773-156814868_156815362-156815456_156815668-156815799_15681672 >> 8-156816770 375 0 * = 375 0 >> AANAAGTGGCCACAANNNNNNNNNGNGCCATNGCCCAGNNNNNNNCTCNACGCNACAAACNCTNAGGAGGGCTTGCA >> G >> B=#==A>ABCCBBAB############################################################## >> # PG:Z:novoalign ZS:Z:QC >> >> I've checked, and these lines ARE present in the input SAM file (made by >> Novoalign), but not in the temporary SAM files I can see created by >> SamTranscriptomeParser, so I suspect they may be lost somehow. >> >> I'm not sure how to go about debugging this myself, so all pointers >> appreciated. >> >> Thanks, >> >> Jon Manning >> >> >> >> The University of Edinburgh is a charitable body, registered in Scotland, >> with registration number SC005336. >> ----------------------------------------------------------------------------- >> - For Developers, A Lot Can Happen In A Second. Boundary is the first to >> Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try >> it FREE! >> http://p.sf.net/sfu/Boundary-d2dvs2__________________________________________ >> _____ Useq-users mailing list >> Use...@li...https://lists.sourceforge.net/lists/listinfo/ >> useq-users > > > -- > Dr Jonathan Manning > Bioinformatics Team > Centre for Cardiovascular Science > University of Edinburgh > Queens Medical Research Institute > 47 Little France Crescent > Edinburgh EH16 4TJ > United Kingdom > T: +44 131 242 6700 > F: +44 131 242 6782 > E: jma...@st... > > > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. -- Dr Jonathan Manning Bioinformatics Team Centre for Cardiovascular Science University of Edinburgh Queens Medical Research Institute 47 Little France Crescent Edinburgh EH16 4TJ United Kingdom T: +44 131 242 6700 F: +44 131 242 6782 E: jma...@st... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. |
From: Jon M. <Jon...@ed...> - 2012-04-12 14:15:30
|
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. |
From: David N. <dav...@gm...> - 2012-04-12 14:16:57
|
Yes, if you save it as a sam it bypasses Picard's SortSam and just writes out the alignments. -cheers, D From: Jon Manning <Jon...@ed...> Date: Thu, 12 Apr 2012 15:15:10 +0100 To: David Nix <dav...@gm...> Cc: <use...@li...> Subject: Re: [Useq-users] Error with USeq SamTranscriptomeParser while processing Novoalign RNA-seq outputs Thanks for the pointers- don't worrry, I will be re-running the alignment. However, specifying '-s output.sam' did at least make things run without error- Zayed indicated that the BAM conversion was the problem, due to the 'absence of a valid sequence dictionary'. But things are much clearer now than they were this morning- thank you. Jon On 12/04/2012 14:49, David Nix wrote: > > Hmm. That error you are seeing is from Picard. STP calls SortSam internally. > Looks like it is trying to write a short that is too big, possibly due to the > huge chromosome name? Or too many chromosome names since these have not been > converted to genomic space. > > > > > Use of the -u option won't change much of anything except redirect the failed > alignment to a file. > > > > > The big problem is you're going to have transcript alignments intermingled > with your genomic alignments and won't be able to map the former to the > latter. > > > > > I don't think you can use your partially converted sam file. Need to rebuild > the novoindex and realign. > > > > > -cheers, D > > > > > From: Jon Manning <Jon...@ed...> > Date: Thu, 12 Apr 2012 14:43:11 +0100 > To: David Nix <dav...@gm...> > Cc: <use...@li...> > Subject: Re: [Useq-users] Error with USeq SamTranscriptomeParser while > processing Novoalign RNA-seq outputs > > > > > > > Okay, that's good to know- thanks. > > In the meantime I tried a fix suggested by Zayed at Novocraft, namely to not > use '-u' and thereby to exclude unmapped reads. Both this and using USeq 8.2.2 > (I was on 8.2.1) changed the error to: > > Exception in thread "main" java.lang.IllegalArgumentException: Value (70699) > to large to be written as ushort. > at net.sf.samtools.util.BinaryCodec.writeUShort(BinaryCodec.java:324) > at net.sf.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:114) > at net.sf.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:37) > at > net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:210) > at net.sf.samtools.util.SortingCollection.add(SortingCollection.java:150) > at > net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:157) > at net.sf.picard.sam.SortSam.doWork(SortSam.java:67) > at > net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java: > 175) > at edu.utah.seq.data.sam.PicardSortSam.<init>(PicardSortSam.java:81) > at > edu.utah.seq.parsers.SamTranscriptomeParser.addHeaderAndSort(SamTranscriptomeP > arser.java:482) > at > edu.utah.seq.parsers.SamTranscriptomeParser.doWork(SamTranscriptomeParser.java > :101) > at > edu.utah.seq.parsers.SamTranscriptomeParser.<init>(SamTranscriptomeParser.java > :55) > at > edu.utah.seq.parsers.SamTranscriptomeParser.main(SamTranscriptomeParser.java:4 > 95) > > I realise I'm working with a bad SAM file from your point of view, but do you > think this error is part of the same thing, or something new? > > Jon > > > On 12/04/2012 14:12, David Nix wrote: >> >> Yes that's incorrect. Don't add the xxxTranscripts.fasta. All of the splice >> junctions are in the xxxSplices.fasta file. I'll cc Colin here to correct >> this in the Novocraft docs. See also >> http://useq.sourceforge.net/usageRNASeq.html >> >> >> >> >> Not sure about the chr1 vs 1 . Off the top of my head I don't think there >> should be a problem with USeq apps. But then again we haven't tested them. >> Most of the genome browsers will probably complain unless you register a >> synonyms table. Sounds like the ensembl browser wont though so maybe it >> isn't an issue. >> >> >> >> >> -cheers, D >> >> >> >> >> From: Jon Manning <Jon...@ed...> >> Date: Thu, 12 Apr 2012 14:04:45 +0100 >> To: David Nix <dav...@gm...> >> Cc: <use...@li...> >> Subject: Re: [Useq-users] Error with USeq SamTranscriptomeParser while >> processing Novoalign RNA-seq outputs >> >> >> >> >> >> >> Hi David, >> >> Thanks for the quick reply. Following the Novoalign folks' instructions the >> transcripts were indeed added to the index. Excerpt from their docs: >> >> novoindex Transcriptome.nix geneMaskedGenome.fasta >> refFlatRad45Num60kMin10Splices.fasta >> refFlatRad45Num60kMin10Transcripts.fasta >> Is that not the right thing to do? Should it just be the genome and the >> splices? >> >> I'm working primarily with Ensembl data so I'd like to keep my chromosomes >> 'sans chr' - unless of course the USeq apps require it? >> >> Thanks, >> >> Jon >> >> >> >> On 12/04/2012 12:45, David Nix wrote: >>> >>> Did you by chance add the transcripts to your genome index from the >>> MakeTranscriptome App? These take the form of >>> ENSDARG00000012493:ENSDART00000126849:chr20:705345-705376_708... >>> >>> >>> >>> >>> That also could be the problem. -cheers, D >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Ahh, looks like you've joined your gene name using a : . Use an _ . The >>> STP uses the : to split the splice junction chromosome name into it's >>> component parts. A good junction should look like >>> >>> >>> >>> >>> ENSDARG00000087418:chr20:6691-6707_9356-9386_9436-9463_9494-9513 >>> >>> >>> >>> >>> Rps3:ENSRNOT00000023935:1:156811472-156811541.... should be >>> Rps3_ENSRNOT00000023935:1:156811472-156811541...... >>> >>> >>> >>> >>> As such STP isn't able to recognize the alignment as needing conversion to >>> genomic coordinates. >>> >>> >>> >>> >>> Also, it would be a good idea to rename your chromosomes to the standard >>> UCSC nomenclature: chr1, chr2, chr3.... I've no idea why NCBI and others >>> switched a couple years back. >>> >>> >>> >>> >>> Yes, all splice junction header lines are stripped from the SAM header, they >>> aren't needed after genomic coordinate conversion. >>> >>> >>> >>> >>> -cheers, D >>> >>> >>> >>> >>> From: Jon Manning <Jon...@ed...> >>> Date: Thu, 12 Apr 2012 10:18:32 +0100 >>> To: <use...@li...> >>> Subject: [Useq-users] Error with USeq SamTranscriptomeParser while >>> processing Novoalign RNA-seq outputs >>> >>> >>> >>> >>> >>> >>> Hello, >>> >>> I've been working through the Novoalign RNA-seq instructions >>> <http://www.novocraft.com/wiki/tiki-index.php?page=RNASeq+analysis%3A+mRNA+a >>> nd+the+Spliceosome&structure=Novocraft+Technologies&page_ref_id=35> , and am >>> stuck at the last stage, where reads are converted back to genomic >>> coordinates with USeq SamTranscriptomeParser, and I'm hoping you may be able >>> to help. >>> >>> When it gets to the 'Adding SAM header, sorting, and writing bam output >>> with Picard's SortSam...' stage I'm getting errors like: >>> >>> Exception in thread "main" net.sf.samtools.SAMFormatException: Error >>> parsing text SAM file. RNAME >>> 'Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500 >>> -156812688_156814773-156814868_156815362-156815456_156815668-156815799_15681 >>> 6728-156816770' not found in any SQ record; Line 27 >>> Line: EBRI093151:81:FC:1:1:3202:1108 133 >>> Rps3:ENSRNOT00000023935:1:156811472-156811541_156811891-156812088_156812500- >>> 156812688_156814773-156814868_156815362-156815456_156815668-156815799_156816 >>> 728-156816770 375 0 * = 375 0 >>> AANAAGTGGCCACAANNNNNNNNNGNGCCATNGCCCAGNNNNNNNCTCNACGCNACAAACNCTNAGGAGGGCTTGC >>> AG >>> B=#==A>ABCCBBAB############################################################# >>> ## PG:Z:novoalign ZS:Z:QC >>> >>> I've checked, and these lines ARE present in the input SAM file (made by >>> Novoalign), but not in the temporary SAM files I can see created by >>> SamTranscriptomeParser, so I suspect they may be lost somehow. >>> >>> I'm not sure how to go about debugging this myself, so all pointers >>> appreciated. >>> >>> Thanks, >>> >>> Jon Manning >>> >>> >>> >>> The University of Edinburgh is a charitable body, registered in Scotland, >>> with registration number SC005336. >>> ---------------------------------------------------------------------------- >>> -- For Developers, A Lot Can Happen In A Second. Boundary is the first to >>> Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try >>> it FREE! >>> http://p.sf.net/sfu/Boundary-d2dvs2_________________________________________ >>> ______ Useq-users mailing list >>> Use...@li...https://lists.sourceforge.net/lists/listinfo >>> /useq-users >> >> >> -- >> Dr Jonathan Manning >> Bioinformatics Team >> Centre for Cardiovascular Science >> University of Edinburgh >> Queens Medical Research Institute >> 47 Little France Crescent >> Edinburgh EH16 4TJ >> United Kingdom >> T: +44 131 242 6700 >> F: +44 131 242 6782 >> E: jma...@st... >> >> >> The University of Edinburgh is a charitable body, registered in Scotland, >> with registration number SC005336. > > > -- > Dr Jonathan Manning > Bioinformatics Team > Centre for Cardiovascular Science > University of Edinburgh > Queens Medical Research Institute > 47 Little France Crescent > Edinburgh EH16 4TJ > United Kingdom > T: +44 131 242 6700 > F: +44 131 242 6782 > E: jma...@st... > > > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. -- Dr Jonathan Manning Bioinformatics Team Centre for Cardiovascular Science University of Edinburgh Queens Medical Research Institute 47 Little France Crescent Edinburgh EH16 4TJ United Kingdom T: +44 131 242 6700 F: +44 131 242 6782 E: jma...@st... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. |