Menu

Fails with empty Metassembly directory

2016-12-30
2017-01-23
  • marcus naymik

    marcus naymik - 2016-12-30

    I am trying to metassemble 3 de novo assemblies (k64, chicago, and k80) which all seem to pass the initial steps but once the Metassembly direcotry is created nothing is in there. The output of my job ends with:

    contig stats:
    /home/mnaymik/TOOLS/Metassembler/bin/RepStats /scratch/mnaymik/genomes/shamster/k80.chicago.k64_InitialCtgStats /scratch/mnaymik/genomes/shamster/k64/k64.ctgs.fasta.lengths.stats /scratch/mnaymik/genomes/shamster/chicago/chicago.ctgs.fasta.lengths.stats /scratch/mnaymik/genomes/shamster/k80/k80.ctgs.fasta.lengths.stats ...

    1: n=535282 [200, 88906] 3960.6 +/- 5943.2 sum=2120031114 n50=10309 n50cnt=61385 f=12878.95

    1: n=1462128 [200, 34960] 1370.1 +/- 1574.1 sum=2003225912 n50=2392 n50cnt=238638 f=3178.62

    1: n=459140 [200, 146384] 4683.2 +/- 7874.4 sum=2150235817 n50=14261 n50cnt=44822 f=17923.24


    No initial primary CE-stat

    And inside the output directory, the 3 separate assembly folders all have this in the .err file

    $cat /outdir/first_assembly/CEstat/first_assembly.err
    
    ERROR reading read pairs from SAM file
    

    I see a multi-GB ctgs.fasta file in each of these dirs so I'm guessing something is going wrong near the final setp. Any help would be greatly appreciated.

    Marcus

     

    Last edit: marcus naymik 2016-12-30
    • Michael Schatz

      Michael Schatz - 2017-01-02

      Are you able to run on the included test data? That will help isolate if it
      is a problem with the installation or something unusual about your data.

      Thanks for your interest,

      Mike

      On Fri, Dec 30, 2016 at 1:12 PM, marcus naymik naymikm@users.sf.net wrote:

      I am trying to metassemble 3 de novo assemblies which all seem to pass the
      initial steps but once the Metassembly direcotry is created nothing is in
      there. The output of my job ends with:

      contig stats:
      /home/mnaymik/TOOLS/Metassembler/bin/RepStats /scratch/mnaymik/genomes/
      shamster/k80.chicago.k64_InitialCtgStats /scratch/mnaymik/genomes/
      shamster/k64/k64.ctgs.fasta.lengths.stats /scratch/mnaymik/genomes/
      shamster/chicago/chicago.ctgs.fasta.lengths.stats
      /scratch/mnaymik/genomes/shamster/k80/k80.ctgs.fasta.lengths.stats ...

      1: n=535282 [200, 88906] 3960.6 +/- 5943.2 sum=2120031114 n50=10309
      n50cnt=61385 f=12878.95

      1: n=1462128 [200, 34960] 1370.1 +/- 1574.1 sum=2003225912 n50=2392
      n50cnt=238638 f=3178.62

      1: n=459140 [200, 146384] 4683.2 +/- 7874.4 sum=2150235817 n50=14261
      n50cnt=44822 f=17923.24


      No initial primary CE-stat

      And insife the output directory the 3 separate assembly folders all have
      thi sin the .err file

      ~~~
      $cat /outdir/first_assembly/CEstat/first_assembly.err

      ERROR reading read pairs from SAM file
      ~~~

      I see a multi-GB ctgs.fasta file in each of these dirs so I'm guessing
      something is going wrong near the final setp. Any help would be greatly
      appreciated.

      Marcus


      Fails with empty Metassembly directory


      Sent from sourceforge.net because you indicated interest in <
      https://sourceforge.net/p/metassembler/discussion/general/>

      To unsubscribe from further messages, please visit <
      https://sourceforge.net/auth/subscriptions/>

       
  • marcus naymik

    marcus naymik - 2017-01-03

    Yes, the test data completes fine. I have 8 paired end fastqs we sequenced for the assembly which I list in the config file like so:

    bowtie2_read1=FQ1_r1.fastq,FQ2_r1.fastq,...
    bowtie2_read2=FQ1_r2.fastq,FQ2_r2.fastq,...

    Would this cause an error? I could concatenate these into 1 fastq for each r1 and r2 instead of 8 separate.

     
    • Michael Schatz

      Michael Schatz - 2017-01-03

      I dont think it is supported to provide comma separated reads like this.
      Can you try again after merging the fastq file together. All the read1
      files should go together, and then all the read2 files in the same order.

      Good luck

      Mike

      On Tue, Jan 3, 2017 at 12:24 PM, marcus naymik naymikm@users.sf.net wrote:

      Yes, the test data completes fine. I have 8 paired end fastqs we sequenced
      for the assembly which I list in the config file like so:

      bowtie2_read1=FQ1_r1.fastq,FQ2_r1.fastq,...
      bowtie2_read2=FQ1_r2.fastq,FQ2_r2.fastq,...

      Would this cause an error? I could concatenate these into 1 fastq for each
      r1 and r2 instead of 8 separate.


      Fails with empty Metassembly directory


      Sent from sourceforge.net because you indicated interest in <
      https://sourceforge.net/p/metassembler/discussion/general/>

      To unsubscribe from further messages, please visit <
      https://sourceforge.net/auth/subscriptions/>

       
  • Tim Hewitt

    Tim Hewitt - 2017-01-16

    Hi Michael, I'm having the exact same problem as Marcus, even after concatenating my read files for the 3 assemblies I'm trying to merge. I also get the "No initial primary CE stat" error near the end. It all works fine when I run on the sample set. Looking through the log file of my data run I see there are several instances of "Can't locate Statistics/Descriptive.pm in @INC" which I assume is a perl library missing from $PATH. I thought this might be causing the failure but then I see this same error is also present when I ran the sample set but was still able to complete with a merged output assembly. Any idea what might be going on?
    Tim

     
    • Michael Schatz

      Michael Schatz - 2017-01-16

      Yes, I bet it is related to the Statistics/Descriptive module errors. Can
      you try to resolve that?

      Thank you

      Mike

      On Sun, Jan 15, 2017 at 8:02 PM, Tim Hewitt thewrust@users.sf.net wrote:

      Hi Michael, I'm having the exact same problem as Marcus, even after
      concatenating my read files for the 3 assemblies I'm trying to merge. I
      also get the "No initial primary CE stat" error near the end. It all works
      fine when I run on the sample set. Looking through the log file of my data
      run I see there are several instances of "Can't locate
      Statistics/Descriptive.pm in @INC" which I assume is a perl library missing
      from $PATH. I thought this might be causing the failure but then I see this
      same error is also present when I ran the sample set but was still able to
      complete with a merged output assembly. Any idea what might be going on?
      Tim


      Fails with empty Metassembly directory


      Sent from sourceforge.net because you indicated interest in <
      https://sourceforge.net/p/metassembler/discussion/general/>

      To unsubscribe from further messages, please visit <
      https://sourceforge.net/auth/subscriptions/>

       
  • Tim Hewitt

    Tim Hewitt - 2017-01-17

    Thanks, trying to fix it now and will retry, but I'm wondering why it still worked with the sample data despite the missing module
    cheers,
    Tim

     
  • Daniel Barrell

    Daniel Barrell - 2017-01-23

    Hi Mike,

    I also have the ERROR reading read pairs from SAM file error. I'm not sure if it should be reading the sam file, as far as I am aware the sam file gets deleted after the sam2bam conversion, correct? Here's where it finishes off:

    `
    <snip>
    ---------- Run bash command ----------

    contig stats:
    /usr/local/bin/Metassembler/bin/RepStats ./MergeMetassemble/first_assembly.second_assembly_InitialCtgStats ./MergeMetassemble/second_assembly/second_assembly.ctgs.fasta.lengths.stats ./MergeMetassemble/first_assembly/first_assembly.ctgs.fasta.lengths.stats ...

    1: n=885004 [200, 2339200] 2975.4 +/- 19000.6 sum=2633240311 n50=82559 n50cnt=8173 f=124310.90

    1: n=6764215 [200, 64823] 580.5 +/- 1376.3 sum=3926412717 n50=1498 n50cnt=415094 f=3843.60


    No initial primary CE-stat
    `

    I can run the test set fine, I have Statistics::Descriptive in my @INC and my reads are concatenated. Earlier I found this error in ./MergeMetassemble/first_assembly-contigs/CEstat/BWTaln/first_assembly.mtp.err, which I ignored since the run was continuing:

    <snip> Error, fewer reads in file specified with -2 than in file specified with -1 Error, fewer reads in file specified with -2 than in file specified with -1 Error: Encountered exception: 'Unidentified exception' Command: /usr/local/bin/bowtie2-2.3.0/bowtie2-align-s --wrapper basic-0 -x ./MergeMetassemble/first_assembly/CEstat/BWTaln/first_assembly.bld --maxins 3000 --minins 1000 --rf --threads 32 -1 /metassemble/reads_1.fq -2 /metassemble/reads_2.fq (ERR): bowtie2-align exited with value 1

    Both assemblies report this alignment problem, so it might also impact something downstream. Do you have any ideas?

    Thanks

    Dan

     

    Last edit: Daniel Barrell 2017-01-23
    • Michael Schatz

      Michael Schatz - 2017-01-23

      It is hard to tell from these error messages, but it looks like your input
      data are somehow corrupted. This would explain why bowtie is crashing. The
      first thing to check is why the number of reads for read 1 is different
      from read 2 -- are you trimming the reads? You shouldnt need to do that and
      should run with all the data

      Good luck

      Mike

      On Mon, Jan 23, 2017 at 11:40 AM, Daniel Barrell dgb@users.sf.net wrote:

      Hi Mike,

      I also have the ERROR reading read pairs from SAM file error, here's
      where it finishes off:

      `
      <snip>
      ---------- Run bash command ----------

      contig stats:
      /usr/local/bin/Metassembler/bin/RepStats ./MergeMetassemble/first_
      assembly.second_assembly_InitialCtgStats ./MergeMetassemble/second_
      assembly/second_assembly.ctgs.fasta.lengths.stats
      ./MergeMetassemble/first_assembly/first_assembly.ctgs.fasta.lengths.stats
      ...

      1: n=885004 [200, 2339200] 2975.4 +/- 19000.6 sum=2633240311 n50=82559
      n50cnt=8173 f=124310.90

      1: n=6764215 [200, 64823] 580.5 +/- 1376.3 sum=3926412717 n50=1498
      n50cnt=415094 f=3843.60


      No initial primary CE-stat
      `

      I can run the test set fine, I have Statistics::Descriptive in my @INC and
      my reads are concatenated. Earlier I found this error in
      ./MergeMetassemble/first_assembly-contigs/CEstat/BWTaln/first_assembly.mtp.err,
      which I ignored since the run was continuing:

      <snip> Error, fewer reads in file specified with -2 than in file specified with -1 Error, fewer reads in file specified with -2 than in file specified with -1 Error: Encountered exception: 'Unidentified exception' Command: /usr/local/bin/bowtie2-2.3.0/bowtie2-align-s --wrapper basic-0 -x ./MergeMetassemble/first_assembly/CEstat/BWTaln/first_assembly.bld --maxins 3000 --minins 1000 --rf --threads 32 -1 /metassemble/reads_1.fq -2 /metassemble/reads_2.fq (ERR): bowtie2-align exited with value 1

      Both assemblies report this alignment problem, so it might also impact
      something downstream. Do you have any ideas? I'm not sure if it should be
      reading the sam file, as far as I am aware the sam file gets deleted after
      the sam2bam conversion, correct?

      Thanks

      Dan


      Fails with empty Metassembly directory


      Sent from sourceforge.net because you indicated interest in <
      https://sourceforge.net/p/metassembler/discussion/general/>

      To unsubscribe from further messages, please visit <
      https://sourceforge.net/auth/subscriptions/>

       
      • Daniel Barrell

        Daniel Barrell - 2017-01-24

        Hi Mike,
        Yes, the reads had been filtered with kontaminant and there were mismatches in the number of reads, I'm now re-running with the complete fastq files.
        Thanks,
        Dan

         
  • Tim Hewitt

    Tim Hewitt - 2017-01-24

    Hi Mike,
    I fixed Statistics::Descriptive but am still having the same problem of ending up with an empty metassembly directory but I'm not seeing any error message in stdout or in .err except for the last line as 'No initial primary CE-stat'. There doesn't appear to be any issues with bowtie.
    Tim

     
    • Michael Schatz

      Michael Schatz - 2017-01-24

      Hmm, in that case I can only guess there is something unusual about your
      data compared to what is expected. Id start by stripping off any of the
      headers from the reads in the fastq file using something like: awk '{print
      $1}' reads.1.fq > reads.clean.1.fq; awk '{print $1}' reads.2.fq >
      reads.clean.2.fq

      Then try aligning just the first 1000 reads or so:

      head -4000 reads.clean.1.fq > reads.clean.head.1.fq
      head -4000 reads.clean.2.fq > reads.clean.head.2.fq

      Then check the alignments for those first 1000 reads to make sure they look
      reasonable. Note that it expects the reads to be "outies" where the mate
      pairs face away from each other.

      Hope this helps

      Mike

      On Mon, Jan 23, 2017 at 7:15 PM, Tim Hewitt thewrust@users.sf.net wrote:

      Hi Mike,
      I fixed Statistics::Descriptive but am still having the same problem of
      ending up with an empty metassembly directory but I'm not seeing any error
      message in stdout or in .err except for the last line as 'No initial
      primary CE-stat'. There doesn't appear to be any issues with bowtie.
      Tim


      Fails with empty Metassembly directory


      Sent from sourceforge.net because you indicated interest in <
      https://sourceforge.net/p/metassembler/discussion/general/>

      To unsubscribe from further messages, please visit <
      https://sourceforge.net/auth/subscriptions/>

       
      • Tim Hewitt

        Tim Hewitt - 2017-01-25

        Thanks Mike. My reads are actually "innies", so I changed the bowtie2 option on line 294 of the /bin/metassemble script from "--rf" to "--fr" to accomodate this. Is this a good idea or will it confuse downstream analyses?
        Tim

         
        • Michael Schatz

          Michael Schatz - 2017-01-25

          It is not well tested, and I would try a very small example first to
          confirm that it works before launching a bigger job. Please let me know how
          it goes

          Good luck

          Mike

          On Tue, Jan 24, 2017 at 8:26 PM, Tim Hewitt thewrust@users.sf.net wrote:

          Thanks Mike. My reads are actually "innies", so I changed the bowtie2
          option on line 294 of the /bin/metassemble script from "--rf" to "--fr" to
          accomodate this. Is this a good idea or will it confuse downstream analyses?
          Tim


          Fails with empty Metassembly directory


          Sent from sourceforge.net because you indicated interest in <
          https://sourceforge.net/p/metassembler/discussion/general/>

          To unsubscribe from further messages, please visit <
          https://sourceforge.net/auth/subscriptions/>

           
          • Tim Hewitt

            Tim Hewitt - 2017-01-27

            Thanks Mike. I tried what you suggested and got an output without error. I'll go ahead and try with the full dataset now. Can I ask what's the difference between the outputs 'assem1.assem2.ctgs.fasta' and 'assem1.assem2.fasta'? They are identical in terms of contig statistics - is there any situation where they would be different?
            cheers,
            Tim

             
            • Michael Schatz

              Michael Schatz - 2017-01-27

              Hi Tim,

              I'm not at my desk but I think one is supposed to have contigs and one will have scaffolds. Do you have scaffolds for your inputs (with N regions)? If not these files will be the same.

              Good luck

              Mike

              On Jan 26, 2017, at 10:14 PM, Tim Hewitt thewrust@users.sf.net wrote:

              Thanks Mike. I tried what you suggested and got an output without error. I'll go ahead and try with the full dataset now. Can I ask what's the difference between the outputs 'assem1.assem2.ctgs.fasta' and 'assem1.assem2.fasta'? They are identical in terms of contig statistics - is there any situation where they would be different?
              cheers,
              Tim


              Fails with empty Metassembly directory


              Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/metassembler/discussion/general/

              To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

               
              • Tim Hewitt

                Tim Hewitt - 2017-01-30

                Hi Mike,
                the full metassembly appeared to be successful - it seems removing whitespace from the headers in the fastqs is what did the trick. The N50 of the contigs is greatly improved but the combined length is significantly reduced compared to the original assemblies - any thoughts why that could be?
                cheers,
                Tim

                 
                • Michael Schatz

                  Michael Schatz - 2017-01-30

                  Hi Tim,

                  Glad we are making progress. Im guessing what happened is one of your
                  assemblies has contigs that dont align to contigs in the other assemblies.
                  By default those contigs are left out of the final result as being too
                  unreliable. But you can try recovering them by setting
                  meta2fasta_keepUnaligned=1 in the [global] section of the contig file. It
                  is strange to me that a large number of contigs wouldnt align to each
                  other. You could try using more sensitive parameters to nucmer (-l 30 -c
                  100 or so), but it could also just be there are few low quality / very
                  short contigs present.

                  And the since your assembly is shrinking, that might also explain why the
                  N50 goes way up. The N50 size is the size such that half of your assembly
                  is in contigs this size or larger, but if the size of your assembly
                  shrinks, then it is likely this value will go up.

                  Hope this helps

                  Mike

                  On Sun, Jan 29, 2017 at 8:36 PM, Tim Hewitt thewrust@users.sf.net wrote:

                  Hi Mike,
                  the full metassembly appeared to be successful - it seems removing
                  whitespace from the headers in the fastqs is what did the trick. The N50 of
                  the contigs is greatly improved but the combined length is significantly
                  reduced compared to the original assemblies - any thoughts why that could
                  be?
                  cheers,
                  Tim


                  Fails with empty Metassembly directory


                  Sent from sourceforge.net because you indicated interest in <
                  https://sourceforge.net/p/metassembler/discussion/general/>

                  To unsubscribe from further messages, please visit <
                  https://sourceforge.net/auth/subscriptions/>

                   
                  • Tim Hewitt

                    Tim Hewitt - 2017-02-01

                    Hi Mike,
                    This is good to know. I do believe I had quite a few very short contigs (<300bp). Will keep trying with different parameters. Thanks for your explanations.
                    cheers,
                    Tim

                     

Log in to post a comment.