I introduced a bug in sebhtml/git by carrying out the following code change:
diff --git a/code/structures/Kmer.cpp b/code/structures/Kmer.cpp
index 83e145b..10e4871 100644
--- a/code/structures/Kmer.cpp
+++ b/code/structures/Kmer.cpp
@@ -155,8 +155,8 @@ uint8_t Kmer::getFirstSegmentFirstCode(int w){
int Kmer::vertexRank(int _size,int w,bool color){
Kmer b=complementVertex(w,color);
if(isLower(&b))
- b=*this;
- return b.hash_function_1()%(_size);
+ return b.hash_function_1()%(_size);
+ return hash_function_1()%(_size);
}
/**
In other words, I changed vertexRank to the following:
int Kmer::vertexRank(int _size,int w,bool color){
Kmer b=complementVertex(w,color);
if(isLower(&b))
return b.hash_function_1()%(_size);
return hash_function_1()%(_size);
}
This shouldn't be a problem, right? All I've done is remove the copy
from *this to b, which (in the absence of a copy constructor) should do
a shallow copy of the fields of (*this) [i.e. create a new
uint64_t[KMER_U64_ARRAY_SIZE], and copy the bytes across]. I can't see
how the function outcome can be changed by doing this.
[there are a few other places in the code where a similar copy is done]
However, I get segfaults after making this change, and I don't know why.
In order to get my phiX data to work on sebhtml/git, I needed to adjust
the minimumY for CoverageDistribution:
diff --git a/code/graph/CoverageDistribution.cpp
b/code/graph/CoverageDistribution.cpp
index 82cae53..db6cc60 100644
--- a/code/graph/CoverageDistribution.cpp
+++ b/code/graph/CoverageDistribution.cpp
@@ -48,7 +48,7 @@
CoverageDistribution::CoverageDistribution(map<int,uint64_t>*distributionOfCover
int windowSize=10;
int minimumX=1;
- uint64_t minimumY=2*4096;
+ uint64_t minimumY=1;
uint64_t minimumY2=55000;
int maximumX=65535-1;
int safeThreshold=256;
If you want the coverage distribution for this, you can find it here:
http://pastebin.com/KG01Tepp
If I run with one processor, everything is fine:
$ mpirun -np 1 ../../../sebgit/ray/code/Ray -write-seeds -k 10 -p
tests/phix/phix_5k_1.fasta tests/phix/phix_5k_2.fasta | grep -A 8
'Number of contigs'
Number of contigs: 1
Total length of contigs: 5382
Number of contigs >= 500 nt: 1
Total length of contigs >= 500 nt: 5382
Number of scaffolds: 1
Total length of scaffolds: 5382
Number of scaffolds >= 500 nt: 1
Total length of scaffolds >= 500: 5382
Rank 0 wrote RayOutput.Contigs.fasta
Rank 0 wrote RayOutput.Scaffolds.fasta
But when I run with more than one processor, there is a segfault:
$ mpirun -np 2 ../../../sebgit/ray/code/Ray -write-seeds -k 10 -p
tests/phix/phix_5k_1.fasta tests/phix/phix_5k_2.fasta | grep -A 8
'Number of contigs'
[thaliana:23082] *** Process received signal ***
[thaliana:23082] Signal: Segmentation fault (11)
[thaliana:23082] Signal code: Address not mapped (1)
[thaliana:23082] Failing at address: 0x8
[thaliana:23082] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf020)
[0x7fc03c7aa020]
[thaliana:23082] [ 1]
../../../sebgit/ray/code/Ray(_ZN16MessageProcessor30call_RAY_MPI_TAG_VERTICES_DATAEP7Message+0x29f)
[0x43306f]
[thaliana:23082] [ 2]
../../../sebgit/ray/code/Ray(_ZN7Machine10runVanillaEv+0x75) [0x445695]
[thaliana:23082] [ 3]
../../../sebgit/ray/code/Ray(_ZN7Machine5startEv+0xd57) [0x447767]
[thaliana:23082] [ 4] ../../../sebgit/ray/code/Ray(main+0x2b) [0x42608b]
[thaliana:23082] [ 5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7fc03c435ead]
[thaliana:23082] [ 6] ../../../sebgit/ray/code/Ray() [0x4262e1]
[thaliana:23082] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 23082 on node thaliana
exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
So, I can turn on debug and get the same bug:
diff --git a/Makefile b/Makefile
index d1aca41..fcd7b7e 100644
--- a/Makefile
+++ b/Makefile
@@ -83,7 +83,7 @@ OPTIMIZE = y
# profiling
GPROF = n
-DEBUG = n
+DEBUG = y
ifeq ($(GPROF),y)
OPTIMIZE = n
$ mpirun -np 2 xterm -e gdb --args ../../../sebgit/ray/code/Ray
-write-seeds -k 10 -p tests/phix/phix_5k_1.fasta
tests/phix/phix_5k_2.fasta | grep -A 8 'Number of contigs'
[from window for rank 0]
Program received signal SIGSEGV, Segmentation fault.
0x00000000004c5aa7 in MessageProcessor::call_RAY_MPI_TAG_VERTICES_DATA (
this=0x7b04c8, message=0x9ca280)
at code/communication/MessageProcessor.cpp:465
465
if(candidate->m_count<m_parameters->getMinimumCoverageToStore())
(gdb) bt
#0 0x00000000004c5aa7 in MessageProcessor::call_RAY_MPI_TAG_VERTICES_DATA (
this=0x7b04c8, message=0x9ca280)
at code/communication/MessageProcessor.cpp:465
#1 0x00000000004c3e61 in MessageProcessor::processMessage (this=0x7b04c8,
message=0x9ca280) at code/communication/MessageProcessor.cpp:65
#2 0x00000000004e27fe in Machine::processMessages (this=0x79e090)
at code/core/Machine.cpp:614
#3 0x00000000004e24ba in Machine::runVanilla (this=0x79e090)
at code/core/Machine.cpp:533
#4 0x00000000004e2492 in Machine::run (this=0x79e090)
at code/core/Machine.cpp:517
#5 0x00000000004e1f24 in Machine::start (this=0x79e090)
at code/core/Machine.cpp:462
#6 0x00000000005469a4 in main (argc=7, argv=0x7fffffffdf48)
at code/assembler/ray_main.cpp:29
(gdb) print candidate
$1 = (KmerCandidate *) 0x0
So that code change has caused a bug in the MessageProcessor,
specifically the call_RAY_MPI_TAG_VERTICES_DATA function.... but I'm
using DEBUG, which should activate ASSERT, which should verify canidate
== NULL with an assert. I eventually worked out that I need to make
another change to the makefile to remove an erroneous space (cat -A is
used here to make the change obvious):
$ git diff Makefile | cat -A
diff --git a/Makefile b/Makefile$
index d1aca41..8ca7842 100644$
--- a/Makefile$
+++ b/Makefile$
@@ -83,7 +83,7 @@ OPTIMIZE = y$
$
# profiling$
GPROF = n$
-DEBUG = n$
+DEBUG = y$
$
ifeq ($(GPROF),y)$
^IOPTIMIZE = n$
@@ -93,7 +93,7 @@ endif$
ifeq ($(DEBUG),y)$
^IOPTIMIZE = n$
^IFORCE_PACKING = n$
-^IASSERT = y $
+^IASSERT = y$
endif$
$
PEDANTIC = n$
So now I get the assert failing, which means the segfault doesn't get
reached:
$ mpirun -np 2 xterm -e gdb --args ../../../sebgit/ray/code/Ray
-write-seeds -k 10 -p tests/phix/phix_5k_1.fasta
tests/phix/phix_5k_2.fasta | grep -A 8 'Number of contigs
[from window for rank 1]
Ray: code/communication/MessageProcessor.cpp:462: void
MessageProcessor::call_RAY_MPI_TAG_VERTICES_DATA(Message*): Assertion
`candidate!=__null' failed.
Program received signal SIGABRT, Aborted.
0x00007ffff5f3a405 in raise (sig=<value optimized out>)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
in ../nptl/sysdeps/unix/sysv/linux/raise.c
(gdb) bt
#0 0x00007ffff5f3a405 in raise (sig=<value optimized out>)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007ffff5f3d680 in abort () at abort.c:92
#2 0x00007ffff5f335b1 in __assert_fail (
assertion=0x563296 "candidate!=__null", file=<value optimized out>,
line=462,
function=0x563f40 "void
MessageProcessor::call_RAY_MPI_TAG_VERTICES_DATA(Message*)") at assert.c:81
#3 0x00000000004c8d37 in MessageProcessor::call_RAY_MPI_TAG_VERTICES_DATA (
this=0x7c44c8, message=0x9de2c0)
at code/communication/MessageProcessor.cpp:462
#4 0x00000000004c6f8d in MessageProcessor::processMessage (this=0x7c44c8,
message=0x9de2c0) at code/communication/MessageProcessor.cpp:65
#5 0x00000000004e6872 in Machine::processMessages (this=0x7b2090)
at code/core/Machine.cpp:614
#6 0x00000000004e64e8 in Machine::runVanilla (this=0x7b2090)
at code/core/Machine.cpp:533
#7 0x00000000004e64c0 in Machine::run (this=0x7b2090)
at code/core/Machine.cpp:517
#8 0x00000000004e5f52 in Machine::start (this=0x7b2090)
at code/core/Machine.cpp:462
#9 0x0000000000550ac0 in main (argc=7, argv=0x7fffffffdf48)
at code/assembler/ray_main.cpp:29
So, it's segfaulting because it can't find a k-mer (I've done additional
tests to verify that, as expected, both the k-mer and its reverse
complement cannot be found), which presumably means that the k-mer that
it's looking for wasn't inserted into the graph.
Just as a reminder, everything still works with a single processor:
$ mpirun -np 1 ../../../sebgit/ray/code/Ray -write-seeds -k 10 -p
tests/phix/phix_5k_1.fasta tests/phix/phix_5k_2.fasta | grep -A 8
'Number of contigs'
Number of contigs: 1
Total length of contigs: 5382
Number of contigs >= 500 nt: 1
Total length of contigs >= 500 nt: 5382
Number of scaffolds: 1
Total length of scaffolds: 5382
Number of scaffolds >= 500 nt: 1
Total length of scaffolds >= 500: 5382
Rank 0 wrote RayOutput.Contigs.fasta
Rank 0 wrote RayOutput.Scaffolds.fasta
I just don't understand why this removal of a k-mer copy breaks code.
What's going on here? Any insight would be appreciated.
-- David
|