Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#317 invalid reads in unmapped fastq (missing @ symbol)

v0.9.0
open
nobody
None
5
2015-02-24
2014-07-22
Brad Langhorst
No

I'm also seeing some reads with invalid header lines in fastq output of unaligned reads.
Mine is a 2x300 miseq run that has been trimmed with cutadapt.

I ran bowtie like this:
bowtie2 -p 16 -x /mnt/galaxy/data/genome/eco_hg19/bowtie2_index/eco_hg19 -1 test_r1.fastq -2 test_r2.fastq -I 0 -X 1000 --un-conc unmapped_reads.fastq --local --sensitive --gbar 4 > mapped_reads.sam

and I see this invalid read (note the missing @ symbol):
grep -A 3 M00532:58:000000000-AA5N2:1:1105:11203:24523 unmapped_reads.1.fastq

M00532:58:000000000-AA5N2:1:1105:11203:24523 1:N:0:1
ACGGCAGCACCGTCACGCAGCACGAAATAAGCATCTGATTTTTCGCACGGCAGCTCAGGTAATGGCACCGGATCTTCTTTCGGTGGTGCCACTTCGCCGTTACGTAAAATCTTACGTGTGATTTTACACTCTTCGTTGGTGCAGGCCATGTATTTACCGAAACGCACCATTTTCAGGTGCATTTCAGAGCCACATTTTTCACACTCAACGATCGGGCCGTCATAACCTTTAATGCGGAATTCGCCCTCTTCGCTCTAGTAACCGTC
+
8BCC<FFFFCFFFFEEFFFGGGGGGGE?FGGAFGGGGGCCDF,@F:@@FD@FC,EFGAFGGGGDFFGGGGGDCCFA<<C<,BBC:>BEFEF9,?EFC:C>CC:B,CFGGFFGEGF?F,4B,<B<CFFFGGGGGGGF:BFGGGGGGFGGGG<D,3<FFFFBFGC@F+@FFFF;B9DFG9,@FGG9;BC@@D>FFGGF@FFC9<FFGFFG7:4*,<::*1<1*=FC<<F9F9C;9C9C8=?8++/;AE>8EFF/21*2<:CC?A+:C:

however it's valid in the input R1 file.

@M00532:58:000000000-AA5N2:1:1105:11203:24523 1:N:0:1
ACGGCAGCACCGTCACGCAGCACGAAATAAGCATCTGATTTTTCGCACGGCAGCTCAGGTAATGGCACCGGATCTTCTTTCGGTGGTGCCACTTCGCCGTTACGTAAAATCTTACGTGTGATTTTACACTCTTCGTTGGTGCAGGCCATGTATTTACCGAAACGCACCATTTTCAGGTGCATTTCAGAGCCACATTTTTCACACTCAACGATCGGGCCGTCATAACCTTTAATGCGGAATTCGCCCTCTTCGCTCTAGTAACCGTC
+
8BCC<FFFFCFFFFEEFFFGGGGGGGE?FGGAFGGGGGCCDF,@F:@@FD@FC,EFGAFGGGGDFFGGGGGDCCFA<<C<,BBC:>BEFEF9,?EFC:C>CC:B,CFGGFFGEGF?F,4B,<B<CFFFGGGGGGGF:BFGGGGGGFGGGG<D,3<FFFFBFGC@F+@FFFF;B9DFG9,@FGG9;BC@@D>FFGGF@FFC9<FFGFFG7:4*,<::*1<1*=FC<<F9F9C;9C9C8=?8++/;AE>8EFF/21*2<:CC?A+:C:

this read immediately follows a read that was completely trimmed away by cutadapt (probably was adapter dimer) but cutadapt is set to keep all sequences in input fastq file (instead of throwing away short reads) because it's much easer to keep r1 and r2 the same length for downstream processing.

@M00532:58:000000000-AA5N2:1:1105:19868:24518 1:N:0:1

+

@M00532:58:000000000-AA5N2:1:1105:11203:24523 1:N:0:1
ACGGCAGCACCGTCACGCAGCACGAAATAAGCATCTGATTTTTCGCACGGCAGCTCAGGTAATGGCACCGGATCTTCTTTCGGTGGTGCCACTTCGCCGTTACGTAAAATCTTACGTGTGATTTTACACTCTTCGTTGGTGCAGGCCATGTATTTACCGAAACGCACCATTTTCAGGTGCATTTCAGAGCCACATTTTTCACACTCAACGATCGGGCCGTCATAACCTTTAAT
GCGGAATTCGCCCTCTTCGCTCTAGTAACCGTC
+
8BCC<FFFFCFFFFEEFFFGGGGGGGE?FGGAFGGGGGCCDF,@F:@@FD@FC,EFGAFGGGGDFFGGGGGDCCFA<<C<,BBC:>BEFEF9,?EFC:C>CC:B,CFGGFFGEGF?F,4B,<B<CFFFGGGGGGGF:BFGGGGGGFGGGG<D,3<FFFFBFGC@F+@FFFF;B9DFG9,@FGG9;BC@@D>FFGGF@FFC9<FFGFFG7:4*,<::*1<1*=FC<<F9F9C;9
C9C8=?8++/;AE>8EFF/21*2<:CC?A+:C:

if you map these to an e-coli K-12 reference you can reproduce the problem:
read1

@M00532:58:000000000-AA5N2:1:1105:14588:24517 1:N:0:1
GGGATTTGGTGTACCGAGACGGGACGTAAAATCTGCAGGCATTATAGTGATCCACGCCACATTTTGTCAACGTTTATTGCTAATCATGTGAATGAATATCCAGTTCACTTTCATTTGTTGAATACTTTTGCCTTCTCCTGCTCTCCCTTAAGCGCATTATTTTACAAAAAACACACTAAACTCTTCCTGTCTCCGATAAAAGATGATTAAATGAAAACTCATTTATTTTGCATAAAAATTCAGTGAGAGCGGAAATCCAGGCTCATCATCAGTTAATTAAGCAGGGTGTTATTTTATGAC
+
CCCCCGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGFGGGGGGGGFFGGGFGGGGGGGGFGGGGEGGGGGGGGGGGGGGGGGGGGGGFFGGFGGGG9FGGGEFGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGFFGGGGGGFGGGGGFGDGEGGGGGGGGGGGFGGGGGGGGGGGFFGGAFFDGFGGGGGGGGGDFGGAFFEGEGGGGGGGGGGGGGGGGGGGGGFGGGGGGCFGGGCFGFGGGCFGGGGGCEGG8F9CCFFGGGCEFGGGGGGGC?FFFFG=DGGF3CG>CCFGGF0
@M00532:58:000000000-AA5N2:1:1105:23508:24517 1:N:0:1
AAATGGAATCACTGGCCTCGCTCTATAAAAATCATATAGCTACCTTACAAGAACGGACTCGCGATGCGCTGGCGCGCTTCAAGCTGGATGCGTTACTTATTCACTCCGGCGAGCTGTTCAATGTTTTTCTCGACGATCATCCCTATCCGTTTAAAGTGAACCCGCAATTCAAAGCGTGGGTGCCGGTAACTCA
+
CCCCCFGGGGFGGGGGGGGGGGGGGGFGGGGGGGGGFGGGGGGGGFGGGGGGA<FGGGGGGGGGEEGDGGGGGGGGGCGGGGGGGGGGG?EGEGGGGGCFGGGGGGFGGG@@FGGGGGGGFGG?FGGFGGGFFGGB8BFFGGGGDEFG@BBF<FDCCDGGGGGEG@>FGGGGGCFBCCC7FFEGEGFGGGCF9
@M00532:58:000000000-AA5N2:1:1105:19868:24518 1:N:0:1

+

@M00532:58:000000000-AA5N2:1:1105:11203:24523 1:N:0:1
ACGGCAGCACCGTCACGCAGCACGAAATAAGCATCTGATTTTTCGCACGGCAGCTCAGGTAATGGCACCGGATCTTCTTTCGGTGGTGCCACTTCGCCGTTACGTAAAATCTTACGTGTGATTTTACACTCTTCGTTGGTGCAGGCCATGTATTTACCGAAACGCACCATTTTCAGGTGCATTTCAGAGCCACATTTTTCACACTCAACGATCGGGCCGTCATAACCTTTAATGCGGAATTCGCCCTCTTCGCTCTAGTAACCGTC
+
8BCC<FFFFCFFFFEEFFFGGGGGGGE?FGGAFGGGGGCCDF,@F:@@FD@FC,EFGAFGGGGDFFGGGGGDCCFA<<C<,BBC:>BEFEF9,?EFC:C>CC:B,CFGGFFGEGF?F,4B,<B<CFFFGGGGGGGF:BFGGGGGGFGGGG<D,3<FFFFBFGC@F+@FFFF;B9DFG9,@FGG9;BC@@D>FFGGF@FFC9<FFGFFG7:4*,<::*1<1*=FC<<F9F9C;9C9C8=?8++/;AE>8EFF/21*2<:CC?A+:C:
@M00532:58:000000000-AA5N2:1:1105:15427:24524 1:N:0:1
AATGCGGTCAGGCAATCGGAGGTTCAATTCCTGCCTTTATTTTGGGGTTAAGCGGATATATCGCCAATCAGGTGCAAACGCCGGAAGTTATTATGGGCATCCGCACATCAATTGCCTTAGTACCTTGCGGATTTATGCTACTGGCATTCGTTATTATCTGGTTTTATCCGCTCACGGATAAAAAATTCAAAGAAATCGTGGGTGAAATTGATAATAGTAAAAAAGTGCAGCAGCAATTAATAAGCGATATCACTAATTAATATTCAATAAAAATAATCAGAACATCAAAGGTGAAACTAT
+
CCCCCECFBB<EFGGFGGGGGGGFFGGGGCGGFGFDGGFFGA@EEFDGGG@F9CCFGECFGE::BC+CEFGFG@8FAFFEGFFGG+@5FF,C<E?D?=4F8B4+@4C?FE9FD5B,EFFFFGG,EFCGGGGCF<FFGGGFFGFF@FEGA9B<D<DF<=,F<CDFC,DEFGG+@3FFGCCGCFFCCGFGF,3>:>F@EG,BFEGG*?FFGF,@CA;,@7@C9@FB<B<:C@CF+2+?B:C,?<ECF5CC5?+03<7@F<9CE90<<FC:F7<9FFGGGGFFF:4*+>F98:C6C*2:729*
@M00532:58:000000000-AA5N2:1:1105:14858:24525 1:N:0:1
CCACTAACTCTATGTGAAATAAATCAAAATTTCACGCCGAAATACTCCTTAGGATGTATAGCGAAAAGAGAAAAAGATATACCTCGATCACCCCCTTTCTCCCAAGTGAAAATAAAAGGTTATCAGTTTGCAACATTGAACAACATTCGTTGCAAATCGATAACAACATGCACCTTCAGGATACTATTTATTATGTTCGGCAATGATATTTTCACCCGCGTAGAACGTTCAGAAAATACAAAAATGGCGGAAATCGCCCAATTCCTGCATGAAAATGATTTGAGCGTTAACACCACAGTC
+
CCCCCFFFEGCFGF-C<<E9<FGFCEFGDFGG9FF,FFFGGGGCFGGGGGFFG,CFAFGCFGGGGF+C,@CEGG8,C<CFCF,@FF7CFECECEFDG,,EFGF,CAE<5,,CFFGGG97,CF?ED9EE9FCGGGGG,EGGCFGGGGGF,4B<7D<F,FFG,BFFGGFEGCFGFFFFAA,ADBFGDCGCFF9=AFFFGGCCC<+<>>DFFCEGFEF@D:FEE@**7,>>:DFF9;DEGG7B2CCF:F?C8:E5C?:*=*/*:*>E+<C+AA<C+<F7CFGGGGFC+9C>)>:7C*9<*1/*
@M00532:58:000000000-AA5N2:1:1105:15530:24529 1:N:0:1

read2:

@M00532:58:000000000-AA5N2:1:1105:14588:24517 2:N:0:1
ACGAGGGATCGCATCATAATCCTCTTCGTCTGGCTGGCCCAGGTTTGCAGTATATGCATAAGGAACCGCTCCCTTTTGTCGCATCCACAGCAGTGCGGCACTGGTGTCCAGACCGCCAGAAAAAGCGATACCAATACGTTGACCTACCGGGAGATGCTTGAGAATCGTCGTCATAAAATAACACCCTGCTTAATTAACTGATGATGAGCCTGGATTTCAGCTCTCACTGACTTTTTATGCAAACTAAATGAGTTTTCATTTAATCCTCTTTTATCGGAGACAGGGAGAGTTTAGTGTGTT
+
CCCCCGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGFGGDFGGGGGGGGGGGGGGFGGGGGEGGGGGGGGGGGGGGGGDGGGFGGGGFGGGGG7F:FFEE@FCFFFCFGFGGGGFDGGGGGGGCFFFDGGGGGGDFF,,@?7@GGGGGGGGGFDEGGGCAEGGCFE*>FE89EFCCFFGGFBEFGGG787C)/88E8<87DE@<CG@:C?F(:@;=A67CFEG)*)+/9?E+)+9;5.5?5EDAFA)644-6)--:(431)(.(2(40(())).5)))1,,
@M00532:58:000000000-AA5N2:1:1105:23508:24517 2:N:0:1
TGAGTTACCGGCACCCACGCTTTGAATTGCGGGTTCACTTTAAACGGATAGGGATGATCGTCGAGAAAAACATTGAACAGCTCGCCGGAGTGAATAAGTAACGCATCCAGCTTGAAGCGCGCCAGCGCATCGCGAGTCCGTTCTTGTAAGGTAGCTATATGATTTTTATAGAGCGAGGCCAGTGATTCCATTT
+
C@CCCFEGGGDGGGGGGGGGGGFFGGGGDF;@:FCFGGGGGGGFFGE:FCF8F<@BEFG?FGD@FF@GGGGGGGGGGGFGG<FEFGEF@F+FAFAF9FF,FGGGECGGAFFFF8F5?ECFEFGGGGEG:FGGGC<@FFGGGDFFGG=C9EFFGGGGFF;A@D9;D?9DGGG9FGB:>CGGG56<,B,=C@;E9
@M00532:58:000000000-AA5N2:1:1105:19868:24518 2:N:0:1

+

@M00532:58:000000000-AA5N2:1:1105:11203:24523 2:N:0:1
GACGGTTACGAGATCGAAGAGGGCGAATTCCGCATTAAAGGTTATGACGGCCCGATCGTTGAGTGTGAAAAATGTGGATCTGAAATGCACCTGAAAATGGGGCGATTCGGTAAATACATGGCCTGCACCAACGAAGAGTGTAAAAACACACGTAAGAATTAACGTCACGGCGAAGTGGCACCACCGAAAGAAGATCCGGTGACATTACCTGAGCGGACGTGAGAAAAATCAGGAGTGTATTGTGTGCTGCGTGAGGGTGCCGCCGGAGCACGGAATGGCAGCAGGATGGGAAAGGGGGTG
+
<BACC@FFGGGGCECD@FGGGGGDGGGFGGDFGGGGGFGACFGGGFGFEG@6C@F@6@C,CDECE<FCFG,CEFFFG,:E,CE<DF<EDAEGGDFGG,CFFF:C+4+C=E7ACF,BFFCFGD<<A8EEDFGCF>E===,EDFEGD8+@BF+@,@FGF,>B,7EB:+3*:>BE*=C*9<5,21=**?EGGCCCFC>+<57*0*)<77C2:<B)08*:*-C14//.*(2>(/57/(>)(62/:*-+.*.*.)/-)((.(-(((-(-,4.4((.32(,().()(-((((),-:(,((((-.((
@M00532:58:000000000-AA5N2:1:1105:15427:24524 2:N:0:1
GGCCACTATTTTTCTCATAGTTGCACCTTTGATGTTCTGATTATTTTTATTGAATATTAATTAGTGATATCGCTGATTAATTGCTGCTGCACTTTTTTACGATTATCAATTTCAACCACGATTTCTTTGAATTTTTTATCCGTGAGCGGATACAACCAGATAATAACGAATGCCAGTAGCATAAATCCGCAAGGTACTAAGGCAATTGATGTTCGGATGCCCATAATAACTTCCGGCGTTTGCACCTGCTTGGCCATATATCCGCTTACCCCCCAAATAACGGCAGGGAGTGGACCCCCC
+
C@BCCGFFFGGGGGAFFGFGGGGGGGGGGGGFGCGFGGGFGGGGGGGGGGGDGGG9FGG9FF9,CEFFA<EGDGDFFF9EEECGGGGFGDFCGGGGFGFGGFFGCFF9,AFFE,EE<CBFBF8DECFC,EFGGFEACFFFDED:FGGG+>@E,ECEEGGGGFGCEFFGCGFFA,EF@8,,@CAFGGFE6@E638@EGGGDFFFFFFC6@2+1+=C5@1@F)9*)33+5037,=@>*:@9?9<CCBC2?(043:()/(/9)/1)(63.1;13=0((-1)-/)((((,(((,-((.4(3)47
@M00532:58:000000000-AA5N2:1:1105:14858:24525 2:N:0:1
CTAATATTTCCGGCAATTCCACCGCACGCGATAAGCTTTTCATCGCGGGTTACGGTAATCAATACTTCGACTGTGGTGTCAACGCTCAAATCATTTTCATGCAGGACTTGGGCGATTTCCGCCATTTTTTTATTTTCTGAACGTTTTACGCGGGTGAAAATATCATTGCCGAACATAATAACTAGTATCCTGCAGGTGCATGTTGTTAACGATTTGCAACGACTGTTGTTAACTGTTGAAAACGTATAACCTTTTATTTTCACTTGGGGAAAAGGGGGGTGACTAGGGCAATACTATTTT
+
-AB8CGGGGGGD7CFGCGGGGGGGGGGDCEFGGGGGGFF<CE9FD>FGG+FDFGGGGGG,EC<@FFG<FCC,>FGDCC,E<FFEGGGFGGDDDG<EFGGGGDFDFF7EFGD8FFDCD7EEC7?CDC<FGEC8FCFDAF9CFG?FD8BDA+:>:8BCC,@D9<DCF9,,2?54*4:9FCEBC4;;,=E@=9<:,76*1=0B6;ACCDGC++.:5;C6=5@<(221)-:0.7-:7<4*4)**.*(*(*-/)67)/-5)6944596)))/4-(((.((-(((),(.))(((((-.)).).*))
@M00532:58:000000000-AA5N2:1:1105:15530:24529 2:N:0:1

PS
I could have sworn that I already reported this on a different dataset, but I can't find any evidence that I actually did ... sorry if this is a duplicate.

Related

Bugs: #317

Discussion

  • Brad Langhorst
    Brad Langhorst
    2014-07-22

    I should have mentioned versions...
    i see this in bowtie 2.2.3 and in 2.1.0

     
  • Val
    Val
    2014-07-25

    Hi Brad,

    One thing I can infer from the case you described is that the fastq file is not valid and bowtie2 fails to stop and print an error message at that point. A fastq record with no sequence is invalid. Considering your use case this might seem arguable, but if we agree upon fastq file format specification then the natural outcome would be to trim the other mate as well when one gets fully trimmed.
    However this bowtie2 issue has to be fixed regardless. I will let you know how we decide to proceed about this next week. Until then let me know if I misunderstood anything about this case or if there is something I totally failed to take into consideration.

    thanks,
    Val

     
    • Brad Langhorst
      Brad Langhorst
      2014-07-26

      I think you have it clear...
      i've since switched to a PE aware adapter remover instead of cutadapt

      I don't know the fastq spec... but I don't consider a 0 length read to be
      totally crazy ;)

      Brad

      On Fri, Jul 25, 2014 at 5:52 PM, Val valduboisvert@users.sf.net wrote:

      Hi Brad,

      One thing I can infer from the case you described is that the fastq file
      is not valid and bowtie2 fails to stop and print an error message at that
      point. A fastq record with no sequence is invalid. Considering your use
      case this might seem arguable, but if we agree upon fastq file format
      specification then the natural outcome would be to trim the other mate as
      well when one gets fully trimmed.
      However this bowtie2 issue has to be fixed regardless. I will let you know
      how we decide to proceed about this next week. Until then let me know if I
      misunderstood anything about this case or if there is something I totally
      failed to take into consideration.

      thanks,
      Val


      Status: open
      Group: v0.9.0
      Created: Tue Jul 22, 2014 10:55 PM UTC by Brad Langhorst
      Last Updated: Fri Jul 25, 2014 05:00 PM UTC
      Owner: nobody

      I'm also seeing some reads with invalid header lines in fastq output of
      unaligned reads.
      Mine is a 2x300 miseq run that has been trimmed with cutadapt.

      I ran bowtie like this:
      bowtie2 -p 16 -x /mnt/galaxy/data/genome/eco_hg19/bowtie2_index/eco_hg19
      -1 test_r1.fastq -2 test_r2.fastq -I 0 -X 1000 --un-conc
      unmapped_reads.fastq --local --sensitive --gbar 4 > mapped_reads.sam

      and I see this invalid read (note the missing @ symbol):
      grep -A 3 M00532:58:000000000-AA5N2:1:1105:11203:24523
      unmapped_reads.1.fastq

      M00532:58:000000000-AA5N2:1:1105:11203:24523 1:N:0:1ACGGCAGCACCGTCACGCAGCACGAAATAAGCATCTGATTTTTCGCACGGCAGCTCAGGTAATGGCACCGGATCTTCTTTCGGTGGTGCCACTTCGCCGTTACGTAAAATCTTACGTGTGATTTTACACTCTTCGTTGGTGCAGGCCATGTATTTACCGAAACGCACCATTTTCAGGTGCATTTCAGAGCCACATTTTTCACACTCAACGATCGGGCCGTCATAACCTTTAATGCGGAATTCGCCCTCTTCGCTCTAGTAACCGTC+8BCC<<,BBC:>FFFFCFFFFEEFFFGGGGGGGE?FGGAFGGGGGCCDF,@F:@@FD@FC,EFGAFGGGGDFFGGGGGDCCFA<<C<,BBC:BEFEF9,?EFC:C>CC:B,CFGGFFGEGF?F,4B,B<CFFFGGGGGGGF:BFGGGGGGFGGGG<D,3<FFFFBFGC@F+@FFFF;B9DFG9,@FGG9;BC@@DFFGGF@FFC9<FFGFFG7:4*,<::*1<1*=FC<<F9F9C;9C9C8=?8++ ;AE="">8EFF/21*2<:CC?A+:C:

      however it's valid in the input R1 file.

      @M00532:58:000000000-AA5N2:1:1105:11203:24523 1:N:0:1ACGGCAGCACCGTCACGCAGCACGAAATAAGCATCTGATTTTTCGCACGGCAGCTCAGGTAATGGCACCGGATCTTCTTTCGGTGGTGCCACTTCGCCGTTACGTAAAATCTTACGTGTGATTTTACACTCTTCGTTGGTGCAGGCCATGTATTTACCGAAACGCACCATTTTCAGGTGCATTTCAGAGCCACATTTTTCACACTCAACGATCGGGCCGTCATAACCTTTAATGCGGAATTCGCCCTCTTCGCTCTAGTAACCGTC+8BCC<<,BBC:>FFFFCFFFFEEFFFGGGGGGGE?FGGAFGGGGGCCDF,@F:@@FD@FC,EFGAFGGGGDFFGGGGGDCCFA<<C<,BBC:BEFEF9,?EFC:C>CC:B,CFGGFFGEGF?F,4B,B<CFFFGGGGGGGF:BFGGGGGGFGGGG<D,3<FFFFBFGC@F+@FFFF;B9DFG9,@FGG9;BC@@DFFGGF@FFC9<FFGFFG7:4*,<::*1<1*=FC<<F9F9C;9C9C8=?8++ ;AE="">8EFF/21*2<:CC?A+:C:

      this read immediately follows a read that was completely trimmed away by
      cutadapt (probably was adapter dimer) but cutadapt is set to keep all
      sequences in input fastq file (instead of throwing away short reads)
      because it's much easer to keep r1 and r2 the same length for downstream
      processing.

      @M00532:58:000000000-AA5N2:1:1105:19868:24518 1:N:0:1
      +
      @M00532:58:000000000-AA5N2:1:1105:11203:24523 1:N:0:1ACGGCAGCACCGTCACGCAGCACGAAATAAGCATCTGATTTTTCGCACGGCAGCTCAGGTAATGGCACCGGATCTTCTTTCGGTGGTGCCACTTCGCCGTTACGTAAAATCTTACGTGTGATTTTACACTCTTCGTTGGTGCAGGCCATGTATTTACCGAAACGCACCATTTTCAGGTGCATTTCAGAGCCACATTTTTCACACTCAACGATCGGGCCGTCATAACCTTTAATGCGGAATTCGCCCTCTTCGCTCTAGTAACCGTC+8BCC<<,BBC:>FFFFCFFFFEEFFFGGGGGGGE?FGGAFGGGGGCCDF,@F:@@FD@FC,EFGAFGGGGDFFGGGGGDCCFA<<C<,BBC:BEFEF9,?EFC:C>CC:B,CFGGFFGEGF?F,4B,B<CFFFGGGGGGGF:BFGGGGGGFGGGG<D,3<FFFFBFGC@F+@FFFF;B9DFG9,@FGG9;BC@@DFFGGF@FFC9<FFGFFG7:4*,<::*1<1*=FC<<F9F9C;9C9C8=?8++ ;AE="">8EFF/21*2<:CC?A+:C:

      if you map these to an e-coli K-12 reference you can reproduce the problem:
      read1

      @M00532:58:000000000-AA5N2:1:1105:14588:24517 1:N:0:1GGGATTTGGTGTACCGAGACGGGACGTAAAATCTGCAGGCATTATAGTGATCCACGCCACATTTTGTCAACGTTTATTGCTAATCATGTGAATGAATATCCAGTTCACTTTCATTTGTTGAATACTTTTGCCTTCTCCTGCTCTCCCTTAAGCGCATTATTTTACAAAAAACACACTAAACTCTTCCTGTCTCCGATAAAAGATGATTAAATGAAAACTCATTTATTTTGCATAAAAATTCAGTGAGAGCGGAAATCCAGGCTCATCATCAGTTAATTAAGCAGGGTGTTATTTTATGAC+CCCCCGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGFGGGGGGGGFFGGGFGGGGGGGGFGGGGEGGGGGGGGGGGGGGGGGGGGGGFFGGFGGGG9FGGGEFGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGFFGGGGGGFGGGGGFGDGEGGGGGGGGGGGFGGGGGGGGGGGFFGGAFFDGFGGGGGGGGGDFGGAFFEGEGGGGGGGGGGGGGGGGGGGGGFGGGGGGCFGGGCFGFGGGCFGGGGGCEGG8F9CCFFGGGCEFGGGGGGGC?FFFFG=DGGF3CG>CCFGGF0@M00532:58:000000000-AA5N2:1:1105:23508:24517 1:N:0:1AAATGGAATCACTGGCCTCGCTCTATAAAAATCATATAGCTACCTTACAAGAACGGACTCGCGATGCGCTGGCGCGCTTCAAGCTGGATGCGTTACTTATTCACTCCGGCGAGCTGTTCAATGTTTTTCTCGACGATCATCCCTATCCGTTTAAAGTGAACCCGCAATTCAAAGCGTGGGTGCCGGTAACTCA+CCCCCFGGGGFGGGGGGGGGGGGGGGFGGGGGGGGGFGGGGGGGGFGGGGGGAFGGGGGGGGGEEGDGGGGGGGGGCGGGGGGGGGGG?EGEGGGGGCFGGGGGGFGGG@@FGGGGGGGFGG?FGGFGGGFFGGB8BFFGGGGDEFG@BBF<FDCCDGGGGGEG@FGGGGGCFBCCC7FFEGEGFGGGCF9@M00532:58:000000000-AA5N2:1:1105:19868:24518 1:N:0:1
      +
      @M00532:58:000000000-AA5N2:1:1105:11203:24523 1:N:0:1ACGGCAGCACCGTCACGCAGCACGAAATAAGCATCTGATTTTTCGCACGGCAGCTCAGGTAATGGCACCGGATCTTCTTTCGGTGGTGCCACTTCGCCGTTACGTAAAATCTTACGTGTGATTTTACACTCTTCGTTGGTGCAGGCCATGTATTTACCGAAACGCACCATTTTCAGGTGCATTTCAGAGCCACATTTTTCACACTCAACGATCGGGCCGTCATAACCTTTAATGCGGAATTCGCCCTCTTCGCTCTAGTAACCGTC+8BCC<<,BBC:>FFFFCFFFFEEFFFGGGGGGGE?FGGAFGGGGGCCDF,@F:@@FD@FC,EFGAFGGGGDFFGGGGGDCCFA<<C<,BBC:BEFEF9,?EFC:C>CC:B,CFGGFFGEGF?F,4B,B<CFFFGGGGGGGF:BFGGGGGGFGGGG<D,3<FFFFBFGC@F+@FFFF;B9DFG9,@FGG9;BC@@DFFGGF@FFC9<FFGFFG7:4*,<::*1<1*=FC<<F9F9C;9C9C8=?8++ ;AE="">8EFF/212<:CC?A+:C:@M00532:58:000000000-AA5N2:1:1105:15427:24524 1:N:0:1AATGCGGTCAGGCAATCGGAGGTTCAATTCCTGCCTTTATTTTGGGGTTAAGCGGATATATCGCCAATCAGGTGCAAACGCCGGAAGTTATTATGGGCATCCGCACATCAATTGCCTTAGTACCTTGCGGATTTATGCTACTGGCATTCGTTATTATCTGGTTTTATCCGCTCACGGATAAAAAATTCAAAGAAATCGTGGGTGAAATTGATAATAGTAAAAAAGTGCAGCAGCAATTAATAAGCGATATCACTAATTAATATTCAATAAAAATAATCAGAACATCAAAGGTGAAACTAT+CCCCCECFBB<=,FEFGGFGGGGGGGFFGGGGCGGFGFDGGFFGA@EEFDGGG@F9CCFGECFGE::BC+CEFGFG@8FAFFEGFFGG+@5FF,C<E?D?=4F8B4+@4C?FE9FD5B,EFFFFGG,EFCGGGGCF<FFGGGFFGFF@FEGA9B<D<DF<=,F<CDFC,DEFGG+@3FFGCCGCFFCCGFGF,3:>F@EG,BFEGG?FFGF,@CA;,@7@C9@FB<:C@CF+2+?B:C,?<7@F<9CE90<<9FFGGGGFFF:4*+>B<:C@CF+2+?B:C,?<ECF5CC5?+03<7@F<9CE90<<FC:F7<9FFGGGGFFF:4*+F98:C6C2:729@M00532:58:000000000-AA5N2:1:1105:14858:24525 1:N:0:1CCACTAACTCTATGTGAAATAAATCAAAATTTCACGCCGAAATACTCCTTAGGATGTATAGCGAAAAGAGAAAAAGATATACCTCGATCACCCCCTTTCTCCCAAGTGAAAATAAAAGGTTATCAGTTTGCAACATTGAACAACATTCGTTGCAAATCGATAACAACATGCACCTTCAGGATACTATTTATTATGTTCGGCAATGATATTTTCACCCGCGTAGAACGTTCAGAAAATACAAAAATGGCGGAAATCGCCCAATTCCTGCATGAAAATGATTTGAGCGTTAACACCACAGTC+CCCCCFFFEGCFGF-C<5,,CFFGGG97,CF?ED9EE9FCGGGGG,EGGCFGGGGGF,4B<7D<+<"><E9<FGFCEFGDFGG9FF,FFFGGGGCFGGGGGFFG,CFAFGCFGGGGF+C,@CEGG8,C<CFCF,@FF7CFECECEFDG,,EFGF,CAE<5,,CFFGGG97,CF?ED9EE9FCGGGGG,EGGCFGGGGGF,4B<7D<F,FFG,BFFGGFEGCFGFFFFAA,ADBFGDCGCFF9=AFFFGGCCC<+<>DFFCEGFEF@D:FEE@7,>>:DFF9;DEGG7B2CCF:F?C8:E5C?:=/:>E+<C+AA<C+<F7CFGGGGFC+9C>)>:7C9<1/*@M00532:58:000000000-AA5N2:1:1105:15530:24529 1:N:0:1

      read2:

      @M00532:58:000000000-AA5N2:1:1105:14588:24517 2:N:0:1ACGAGGGATCGCATCATAATCCTCTTCGTCTGGCTGGCCCAGGTTTGCAGTATATGCATAAGGAACCGCTCCCTTTTGTCGCATCCACAGCAGTGCGGCACTGGTGTCCAGACCGCCAGAAAAAGCGATACCAATACGTTGACCTACCGGGAGATGCTTGAGAATCGTCGTCATAAAATAACACCCTGCTTAATTAACTGATGATGAGCCTGGATTTCAGCTCTCACTGACTTTTTATGCAAACTAAATGAGTTTTCATTTAATCCTCTTTTATCGGAGACAGGGAGAGTTTAGTGTGTT+CCCCCGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGFGGDFGGGGGGGGGGGGGGFGGGGGEGGGGGGGGGGGGGGGGDGGGFGGGGFGGGGG7F:FFEE@FCFFFCFGFGGGGFDGGGGGGGCFFFDGGGGGGDFF,,@?7@GGGGGGGGGFDEGGGCAEGGCFE>FE89EFCCFFGGFBEFGGG787C)/88E8<87DE@<CG@:C?F(:@;=A67CFEG)*)+ 9?E+)+9;5.5?5EDAFA)644-6)--:(431)(.(2(40(())).5)))1,,@M00532:58:000000000-AA5N2:1:1105:23508:24517="" 2:N:0:1TGAGTTACCGGCACCCACGCTTTGAATTGCGGGTTCACTTTAAACGGATAGGGATGATCGTCGAGAAAAACATTGAACAGCTCGCCGGAGTGAATAAGTAACGCATCCAGCTTGAAGCGCGCCAGCGCATCGCGAGTCCGTTCTTGTAAGGTAGCTATATGATTTTTATAGAGCGAGGCCAGTGATTCCATTT+C@CCCFEGGGDGGGGGGGGGGGFFGGGGDF;@:FCFGGGGGGGFFGE:FCF8F<a="" href="mailto:@BEFG?FGD@FF@GGGGGGGGGGGFGG&lt;FEFGEF@F+FAFAF9FF,FGGGECGGAFFFF8F5?ECFEFGGGGEG:FGGGC&lt;@FFGGGDFFGG=C9EFFGGGGFF;A@D9;D?9DGGG9FGB:">@BEFG?FGD@FF@GGGGGGGGGGGFGG<FEFGEF@F+FAFAF9FF,FGGGECGGAFFFF8F5?ECFEFGGGGEG:FGGGC<@FFGGGDFFGG=C9EFFGGGGFF;A@D9;D?9DGGG9FGB:CGGG56<,B,=C@;E9@M00532:58:000000000-AA5N2:1:1105:19868:24518 2:N:0:1 + @M00532:58:000000000-AA5N2:1:1105:11203:24523 2:N:0:1GACGGTTACGAGATCGAAGAGGGCGAATTCCGCATTAAAGGTTATGACGGCCCGATCGTTGAGTGTGAAAAATGTGGATCTGAAATGCACCTGAAAATGGGGCGATTCGGTAAATACATGGCCTGCACCAACGAAGAGTGTAAAAACACACGTAAGAATTAACGTCACGGCGAAGTGGCACCACCGAAAGAAGATCCGGTGACATTACCTGAGCGGACGTGAGAAAAATCAGGAGTGTATTGTGTGCTGCGTGAGGGTGCCGCCGGAGCACGGAATGGCAGCAGGATGGGAAAGGGGGTG+<BACC@FFGGGGCECD@FGGGGGDGGGFGGDFGGGGGFGACFGGGFGFEG@6C@F@6@C,CDECE<FCFG,CEFFFG,:E,CE<DF<EDAEGGDFGG,CFFF:C+4+C=E7ACF,BFFCFGD<<A8EEDFGCFE===,EDFEGD8+@BF+@,@FGF,>B,7EB:+3:>BE=C9<5,21=?EGGCCCFC>+<570)<77C2:<B)08*:*-C14 .*(2="">(/57/(>)(62/:-+...)/-)((.(-(((-(-,4.4((.32(,().()(-((((),-:(,((((-.((@M00532:58:000000000-AA5N2:1:1105:15427:24524 2:N:0:1GGCCACTATTTTTCTCATAGTTGCACCTTTGATGTTCTGATTATTTTTATTGAATATTAATTAGTGATATCGCTGATTAATTGCTGCTGCACTTTTTTACGATTATCAATTTCAACCACGATTTCTTTGAATTTTTTATCCGTGAGCGGATACAACCAGATAATAACGAATGCCAGTAGCATAAATCCGCAAGGTACTAAGGCAATTGATGTTCGGATGCCCATAATAACTTCCGGCGTTTGCACCTGCTTGGCCATATATCCGCTTACCCCCCAAATAACGGCAGGGAGTGGACCCCCC+C@BCCGFFFGGGGGAFFGFGGGGGGGGGGGGFGCGFGGGFGGGGGGGGGGGDGGG9FGG9FF9,CEFFA<EGDGDFFF9EEECGGGGFGDFCGGGGFGFGGFFGCFF9,AFFE,EE<CBFBF8DECFC,EFGGFEACFFFDED:FGGG+>@E,ECEEGGGGFGCEFFGCGFFA,EF@8,,@CAFGGFE6@E638@EGGGDFFFFFFC6@2+1+=C5@1@F)9)33+5037,=@>:@9?9<CCBC2?(043:() (="" 9)="" 1)(63.1;13="0((-1)-/)((((,(((,-((.4(3)47@M00532:58:000000000-AA5N2:1:1105:14858:24525" 2:N:0:1CTAATATTTCCGGCAATTCCACCGCACGCGATAAGCTTTTCATCGCGGGTTACGGTAATCAATACTTCGACTGTGGTGTCAACGCTCAAATCATTTTCATGCAGGACTTGGGCGATTTCCGCCATTTTTTTATTTTCTGAACGTTTTACGCGGGTGAAAATATCATTGCCGAACATAATAACTAGTATCCTGCAGGTGCATGTTGTTAACGATTTGCAACGACTGTTGTTAACTGTTGAAAACGTATAACCTTTTATTTTCACTTGGGGAAAAGGGGGGTGACTAGGGCAATACTATTTT+-AB8CGGGGGGD7CFGCGGGGGGGGGGDCEFGGGGGGFF<CE9FD="">FGG+FDFGGGGGG,EC@FFG<FCC,FGDCC,E<FFEGGGFGGDDDG<EFGGGGDFDFF7EFGD8FFDCD7EEC7?CDC<FGEC8FCFDAF9CFG?FD8BDA+:>:8BCC,@D9<DCF9,,2?544:9FCEBC4;;,=E@=9<:,761=0B6;ACCDGC++.:5;C6=5@<(221)-:0.7-:7<44).((-/)67)/-5)6944596)))/4-(((.((-(((),(.))(((((-.)).).))@M00532:58:000000000-AA5N2:1:1105:15530:24529 2:N:0:1

      PS
      I could have sworn that I already reported this on a different dataset,
      but I can't find any evidence that I actually did ... sorry if this is a
      duplicate.


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/bowtie-bio/bugs/317/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #317

      Attachments
  • Val
    Val
    2014-07-31

    Hi Brad,

    Although for an invalid fastq file bowtie2 should stop the execution with an error, we do not want to break any pipelines that are currently using bowtie2 and otherwise did not take into account this behavior. Therefore we decided bowtie2 should print the invalid non-existing records. A patch for this behavior is currently in github and we will include it in the next release. You can download the source code from here: https://github.com/BenLangmead/bowtie2/archive/master.zip

    Let me know if this solves your issue.

    thanks,
    Val