I am trying to metassemble 3 de novo assemblies (k64, chicago, and k80) which all seem to pass the initial steps but once the Metassembly direcotry is created nothing is in there. The output of my job ends with:
And inside the output directory, the 3 separate assembly folders all have this in the .err file
$cat /outdir/first_assembly/CEstat/first_assembly.err
ERROR reading read pairs from SAM file
I see a multi-GB ctgs.fasta file in each of these dirs so I'm guessing something is going wrong near the final setp. Any help would be greatly appreciated.
Marcus
Last edit: marcus naymik 2016-12-30
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am trying to metassemble 3 de novo assemblies which all seem to pass the
initial steps but once the Metassembly direcotry is created nothing is in
there. The output of my job ends with:
I see a multi-GB ctgs.fasta file in each of these dirs so I'm guessing
something is going wrong near the final setp. Any help would be greatly
appreciated.
I dont think it is supported to provide comma separated reads like this.
Can you try again after merging the fastq file together. All the read1
files should go together, and then all the read2 files in the same order.
Hi Michael, I'm having the exact same problem as Marcus, even after concatenating my read files for the 3 assemblies I'm trying to merge. I also get the "No initial primary CE stat" error near the end. It all works fine when I run on the sample set. Looking through the log file of my data run I see there are several instances of "Can't locate Statistics/Descriptive.pm in @INC" which I assume is a perl library missing from $PATH. I thought this might be causing the failure but then I see this same error is also present when I ran the sample set but was still able to complete with a merged output assembly. Any idea what might be going on?
Tim
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Michael, I'm having the exact same problem as Marcus, even after
concatenating my read files for the 3 assemblies I'm trying to merge. I
also get the "No initial primary CE stat" error near the end. It all works
fine when I run on the sample set. Looking through the log file of my data
run I see there are several instances of "Can't locate
Statistics/Descriptive.pm in @INC" which I assume is a perl library missing
from $PATH. I thought this might be causing the failure but then I see this
same error is also present when I ran the sample set but was still able to
complete with a merged output assembly. Any idea what might be going on?
Tim
I also have the ERROR reading read pairs from SAM file error. I'm not sure if it should be reading the sam file, as far as I am aware the sam file gets deleted after the sam2bam conversion, correct? Here's where it finishes off:
I can run the test set fine, I have Statistics::Descriptive in my @INC and my reads are concatenated. Earlier I found this error in ./MergeMetassemble/first_assembly-contigs/CEstat/BWTaln/first_assembly.mtp.err, which I ignored since the run was continuing:
<snip>
Error, fewer reads in file specified with -2 than in file specified with -1
Error, fewer reads in file specified with -2 than in file specified with -1
Error: Encountered exception: 'Unidentified exception'
Command: /usr/local/bin/bowtie2-2.3.0/bowtie2-align-s --wrapper basic-0 -x ./MergeMetassemble/first_assembly/CEstat/BWTaln/first_assembly.bld --maxins 3000 --minins 1000 --rf --threads 32 -1 /metassemble/reads_1.fq -2 /metassemble/reads_2.fq
(ERR): bowtie2-align exited with value 1
Both assemblies report this alignment problem, so it might also impact something downstream. Do you have any ideas?
Thanks
Dan
Last edit: Daniel Barrell 2017-01-23
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It is hard to tell from these error messages, but it looks like your input
data are somehow corrupted. This would explain why bowtie is crashing. The
first thing to check is why the number of reads for read 1 is different
from read 2 -- are you trimming the reads? You shouldnt need to do that and
should run with all the data
Good luck
Mike
On Mon, Jan 23, 2017 at 11:40 AM, Daniel Barrell dgb@users.sf.net wrote:
Hi Mike,
I also have the ERROR reading read pairs from SAM file error, here's
where it finishes off:
I can run the test set fine, I have Statistics::Descriptive in my @INC and
my reads are concatenated. Earlier I found this error in
./MergeMetassemble/first_assembly-contigs/CEstat/BWTaln/first_assembly.mtp.err,
which I ignored since the run was continuing:
<snip>
Error, fewer reads in file specified with -2 than in file specified with -1
Error, fewer reads in file specified with -2 than in file specified with -1
Error: Encountered exception: 'Unidentified exception'
Command: /usr/local/bin/bowtie2-2.3.0/bowtie2-align-s --wrapper basic-0
-x ./MergeMetassemble/first_assembly/CEstat/BWTaln/first_assembly.bld
--maxins 3000 --minins 1000 --rf --threads 32 -1 /metassemble/reads_1.fq -2
/metassemble/reads_2.fq
(ERR): bowtie2-align exited with value 1
Both assemblies report this alignment problem, so it might also impact
something downstream. Do you have any ideas? I'm not sure if it should be
reading the sam file, as far as I am aware the sam file gets deleted after
the sam2bam conversion, correct?
Hi Mike,
Yes, the reads had been filtered with kontaminant and there were mismatches in the number of reads, I'm now re-running with the complete fastq files.
Thanks,
Dan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
On Tue, Jan 24, 2017 at 11:39 AM, Daniel Barrell dgb@users.sf.net wrote:
Hi Mike,
Yes, the reads had been filtered with kontaminant and there were
mismatches in the number of reads, I'm now re-running with the complete
fastq files.
Thanks,
Dan
Hi Mike,
I fixed Statistics::Descriptive but am still having the same problem of ending up with an empty metassembly directory but I'm not seeing any error message in stdout or in .err except for the last line as 'No initial primary CE-stat'. There doesn't appear to be any issues with bowtie.
Tim
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hmm, in that case I can only guess there is something unusual about your
data compared to what is expected. Id start by stripping off any of the
headers from the reads in the fastq file using something like: awk '{print
$1}' reads.1.fq > reads.clean.1.fq; awk '{print $1}' reads.2.fq >
reads.clean.2.fq
Then try aligning just the first 1000 reads or so:
head -4000 reads.clean.1.fq > reads.clean.head.1.fq
head -4000 reads.clean.2.fq > reads.clean.head.2.fq
Then check the alignments for those first 1000 reads to make sure they look
reasonable. Note that it expects the reads to be "outies" where the mate
pairs face away from each other.
Hi Mike,
I fixed Statistics::Descriptive but am still having the same problem of
ending up with an empty metassembly directory but I'm not seeing any error
message in stdout or in .err except for the last line as 'No initial
primary CE-stat'. There doesn't appear to be any issues with bowtie.
Tim
Thanks Mike. My reads are actually "innies", so I changed the bowtie2 option on line 294 of the /bin/metassemble script from "--rf" to "--fr" to accomodate this. Is this a good idea or will it confuse downstream analyses?
Tim
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It is not well tested, and I would try a very small example first to
confirm that it works before launching a bigger job. Please let me know how
it goes
Thanks Mike. My reads are actually "innies", so I changed the bowtie2
option on line 294 of the /bin/metassemble script from "--rf" to "--fr" to
accomodate this. Is this a good idea or will it confuse downstream analyses?
Tim
Thanks Mike. I tried what you suggested and got an output without error. I'll go ahead and try with the full dataset now. Can I ask what's the difference between the outputs 'assem1.assem2.ctgs.fasta' and 'assem1.assem2.fasta'? They are identical in terms of contig statistics - is there any situation where they would be different?
cheers,
Tim
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm not at my desk but I think one is supposed to have contigs and one will have scaffolds. Do you have scaffolds for your inputs (with N regions)? If not these files will be the same.
Thanks Mike. I tried what you suggested and got an output without error. I'll go ahead and try with the full dataset now. Can I ask what's the difference between the outputs 'assem1.assem2.ctgs.fasta' and 'assem1.assem2.fasta'? They are identical in terms of contig statistics - is there any situation where they would be different?
cheers,
Tim
Hi Mike,
the full metassembly appeared to be successful - it seems removing whitespace from the headers in the fastqs is what did the trick. The N50 of the contigs is greatly improved but the combined length is significantly reduced compared to the original assemblies - any thoughts why that could be?
cheers,
Tim
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Glad we are making progress. Im guessing what happened is one of your
assemblies has contigs that dont align to contigs in the other assemblies.
By default those contigs are left out of the final result as being too
unreliable. But you can try recovering them by setting
meta2fasta_keepUnaligned=1 in the [global] section of the contig file. It
is strange to me that a large number of contigs wouldnt align to each
other. You could try using more sensitive parameters to nucmer (-l 30 -c
100 or so), but it could also just be there are few low quality / very
short contigs present.
And the since your assembly is shrinking, that might also explain why the
N50 goes way up. The N50 size is the size such that half of your assembly
is in contigs this size or larger, but if the size of your assembly
shrinks, then it is likely this value will go up.
Hi Mike,
the full metassembly appeared to be successful - it seems removing
whitespace from the headers in the fastqs is what did the trick. The N50 of
the contigs is greatly improved but the combined length is significantly
reduced compared to the original assemblies - any thoughts why that could
be?
cheers,
Tim
Hi Mike,
This is good to know. I do believe I had quite a few very short contigs (<300bp). Will keep trying with different parameters. Thanks for your explanations.
cheers,
Tim
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am trying to metassemble 3 de novo assemblies (k64, chicago, and k80) which all seem to pass the initial steps but once the Metassembly direcotry is created nothing is in there. The output of my job ends with:
I see a multi-GB ctgs.fasta file in each of these dirs so I'm guessing something is going wrong near the final setp. Any help would be greatly appreciated.
Marcus
Last edit: marcus naymik 2016-12-30
Are you able to run on the included test data? That will help isolate if it
is a problem with the installation or something unusual about your data.
Thanks for your interest,
Mike
On Fri, Dec 30, 2016 at 1:12 PM, marcus naymik naymikm@users.sf.net wrote:
Yes, the test data completes fine. I have 8 paired end fastqs we sequenced for the assembly which I list in the config file like so:
bowtie2_read1=FQ1_r1.fastq,FQ2_r1.fastq,...
bowtie2_read2=FQ1_r2.fastq,FQ2_r2.fastq,...
Would this cause an error? I could concatenate these into 1 fastq for each r1 and r2 instead of 8 separate.
I dont think it is supported to provide comma separated reads like this.
Can you try again after merging the fastq file together. All the read1
files should go together, and then all the read2 files in the same order.
Good luck
Mike
On Tue, Jan 3, 2017 at 12:24 PM, marcus naymik naymikm@users.sf.net wrote:
Hi Michael, I'm having the exact same problem as Marcus, even after concatenating my read files for the 3 assemblies I'm trying to merge. I also get the "No initial primary CE stat" error near the end. It all works fine when I run on the sample set. Looking through the log file of my data run I see there are several instances of "Can't locate Statistics/Descriptive.pm in @INC" which I assume is a perl library missing from $PATH. I thought this might be causing the failure but then I see this same error is also present when I ran the sample set but was still able to complete with a merged output assembly. Any idea what might be going on?
Tim
Yes, I bet it is related to the Statistics/Descriptive module errors. Can
you try to resolve that?
Thank you
Mike
On Sun, Jan 15, 2017 at 8:02 PM, Tim Hewitt thewrust@users.sf.net wrote:
Thanks, trying to fix it now and will retry, but I'm wondering why it still worked with the sample data despite the missing module
cheers,
Tim
Hi Mike,
I also have the
ERROR reading read pairs from SAM file
error. I'm not sure if it should be reading the sam file, as far as I am aware the sam file gets deleted after the sam2bam conversion, correct? Here's where it finishes off:`
<snip>
---------- Run bash command ----------
contig stats:
/usr/local/bin/Metassembler/bin/RepStats ./MergeMetassemble/first_assembly.second_assembly_InitialCtgStats ./MergeMetassemble/second_assembly/second_assembly.ctgs.fasta.lengths.stats ./MergeMetassemble/first_assembly/first_assembly.ctgs.fasta.lengths.stats ...
1: n=885004 [200, 2339200] 2975.4 +/- 19000.6 sum=2633240311 n50=82559 n50cnt=8173 f=124310.90
1: n=6764215 [200, 64823] 580.5 +/- 1376.3 sum=3926412717 n50=1498 n50cnt=415094 f=3843.60
No initial primary CE-stat
`
I can run the test set fine, I have Statistics::Descriptive in my @INC and my reads are concatenated. Earlier I found this error in ./MergeMetassemble/first_assembly-contigs/CEstat/BWTaln/first_assembly.mtp.err, which I ignored since the run was continuing:
<snip> Error, fewer reads in file specified with -2 than in file specified with -1 Error, fewer reads in file specified with -2 than in file specified with -1 Error: Encountered exception: 'Unidentified exception' Command: /usr/local/bin/bowtie2-2.3.0/bowtie2-align-s --wrapper basic-0 -x ./MergeMetassemble/first_assembly/CEstat/BWTaln/first_assembly.bld --maxins 3000 --minins 1000 --rf --threads 32 -1 /metassemble/reads_1.fq -2 /metassemble/reads_2.fq (ERR): bowtie2-align exited with value 1
Both assemblies report this alignment problem, so it might also impact something downstream. Do you have any ideas?
Thanks
Dan
Last edit: Daniel Barrell 2017-01-23
It is hard to tell from these error messages, but it looks like your input
data are somehow corrupted. This would explain why bowtie is crashing. The
first thing to check is why the number of reads for read 1 is different
from read 2 -- are you trimming the reads? You shouldnt need to do that and
should run with all the data
Good luck
Mike
On Mon, Jan 23, 2017 at 11:40 AM, Daniel Barrell dgb@users.sf.net wrote:
Hi Mike,
Yes, the reads had been filtered with kontaminant and there were mismatches in the number of reads, I'm now re-running with the complete fastq files.
Thanks,
Dan
That explains it! Mike
On Tue, Jan 24, 2017 at 11:39 AM, Daniel Barrell dgb@users.sf.net wrote:
Hi Mike,
I fixed Statistics::Descriptive but am still having the same problem of ending up with an empty metassembly directory but I'm not seeing any error message in stdout or in .err except for the last line as 'No initial primary CE-stat'. There doesn't appear to be any issues with bowtie.
Tim
Hmm, in that case I can only guess there is something unusual about your
data compared to what is expected. Id start by stripping off any of the
headers from the reads in the fastq file using something like: awk '{print
$1}' reads.1.fq > reads.clean.1.fq; awk '{print $1}' reads.2.fq >
reads.clean.2.fq
Then try aligning just the first 1000 reads or so:
head -4000 reads.clean.1.fq > reads.clean.head.1.fq
head -4000 reads.clean.2.fq > reads.clean.head.2.fq
Then check the alignments for those first 1000 reads to make sure they look
reasonable. Note that it expects the reads to be "outies" where the mate
pairs face away from each other.
Hope this helps
Mike
On Mon, Jan 23, 2017 at 7:15 PM, Tim Hewitt thewrust@users.sf.net wrote:
Thanks Mike. My reads are actually "innies", so I changed the bowtie2 option on line 294 of the /bin/metassemble script from "--rf" to "--fr" to accomodate this. Is this a good idea or will it confuse downstream analyses?
Tim
It is not well tested, and I would try a very small example first to
confirm that it works before launching a bigger job. Please let me know how
it goes
Good luck
Mike
On Tue, Jan 24, 2017 at 8:26 PM, Tim Hewitt thewrust@users.sf.net wrote:
Thanks Mike. I tried what you suggested and got an output without error. I'll go ahead and try with the full dataset now. Can I ask what's the difference between the outputs 'assem1.assem2.ctgs.fasta' and 'assem1.assem2.fasta'? They are identical in terms of contig statistics - is there any situation where they would be different?
cheers,
Tim
Hi Tim,
I'm not at my desk but I think one is supposed to have contigs and one will have scaffolds. Do you have scaffolds for your inputs (with N regions)? If not these files will be the same.
Good luck
Mike
Hi Mike,
the full metassembly appeared to be successful - it seems removing whitespace from the headers in the fastqs is what did the trick. The N50 of the contigs is greatly improved but the combined length is significantly reduced compared to the original assemblies - any thoughts why that could be?
cheers,
Tim
Hi Tim,
Glad we are making progress. Im guessing what happened is one of your
assemblies has contigs that dont align to contigs in the other assemblies.
By default those contigs are left out of the final result as being too
unreliable. But you can try recovering them by setting
meta2fasta_keepUnaligned=1 in the [global] section of the contig file. It
is strange to me that a large number of contigs wouldnt align to each
other. You could try using more sensitive parameters to nucmer (-l 30 -c
100 or so), but it could also just be there are few low quality / very
short contigs present.
And the since your assembly is shrinking, that might also explain why the
N50 goes way up. The N50 size is the size such that half of your assembly
is in contigs this size or larger, but if the size of your assembly
shrinks, then it is likely this value will go up.
Hope this helps
Mike
On Sun, Jan 29, 2017 at 8:36 PM, Tim Hewitt thewrust@users.sf.net wrote:
Hi Mike,
This is good to know. I do believe I had quite a few very short contigs (<300bp). Will keep trying with different parameters. Thanks for your explanations.
cheers,
Tim