dedupe.sh/dedupe2.sh errors overlapping sequences
BBMap short read aligner, and other bioinformatic tools.
Brought to you by:
brian-jgi
Hi Brian,
first let's say that I really like your software, it's speed and versatility always impresses me, although I often have no idea what's going on under the hood.
Now, I'm trying to overlap sequences (form clusters) in a large (~1.GiB, uncompressed) FASTA file. When I run the following command, I get lots of messages including how sequences as output on stderr.
When I run
**dedupe2.sh in=contigs_01.fna.gz,contigs_02.fna.gz,contigs_03.fna.gz,contigs_04.fna.gz,contigs_05.fna.gz,contigs_06.fna.gz minclustersize=2 am=f ac=f fo=t cc=t rnc=f mcs=10 mo=300 minidentity=99 processclusters=t fixmultijoins=f pto=f mst=f pattern=clusters5/contigs_dedup+mh.cluster_%.fna.gz qin=33 dot=clusters5/graph.dot -Xmx500g -eoom**
**dedupe.sh --version**
BBMap version 37.90
**java -version**
openjdk version "1.8.0_121"
OpenJDK Runtime Environment (Zulu 8.20.0.5-linux64) (build 1.8.0_121-b15)
OpenJDK 64-Bit Server VM (Zulu 8.20.0.5-linux64) (build 25.121-b15, mixed mode)
**java parameters**: -Djava.library.path=/path/to/bbmap-37.90/jni/ -ea -Xmx500g -Xms500g -cp /path/to/bbmap-37.90/current/ jgi.Dedupe2
This is the kind of output:
>1
CCCTA...
>2
ATCCG...
at jgi.Dedupe2$Overlap.<init>(Dedupe2.java:3852)
at jgi.Dedupe2$Unit.makeOverlapReverse(Dedupe2.java:5096)
at jgi.Dedupe2$Unit.makeOverlap(Dedupe2.java:4685)
at jgi.Dedupe2$HashThread.findOverlaps(Dedupe2.java:3292)
at jgi.Dedupe2$HashThread.processRead(Dedupe2.java:3156)
at jgi.Dedupe2$HashThread.processReadOuter(Dedupe2.java:3034)
at jgi.Dedupe2$HashThread.run(Dedupe2.java:2969)
Exception in thread "Thread-62" java.lang.AssertionError:
type=FORWARD, len=316, subs=198, edits=0 (p_410, start1=127500, stop1=127815) (p_486, start2=0, stop2=315)
This output repeats for several sequences pairs labeled 1 and 2, and then another block at the end:
at jgi.Dedupe2$ClusterThread.canonicizeNeighbors(Dedupe2.java:2609)
at jgi.Dedupe2$ClusterThread.canonicizeClusterBreadthFirst(Dedupe2.java:2546)
at jgi.Dedupe2$ClusterThread.run(Dedupe2.java:1966)
Exception in thread "Thread-101" java.lang.AssertionError
at jgi.Dedupe2$ClusterThread.findMultiJoinsInCluster(Dedupe2.java:2160)
at jgi.Dedupe2$ClusterThread.run(Dedupe2.java:1934)
Exception in thread "Thread-122" java.lang.AssertionError
at jgi.Dedupe2$Overlap.flip(Dedupe2.java:3992)
at jgi.Dedupe2$ClusterThread.canonicize(Dedupe2.java:2744)
at jgi.Dedupe2$ClusterThread.canonicizeNeighbors(Dedupe2.java:2609)
at jgi.Dedupe2$ClusterThread.canonicizeClusterBreadthFirst(Dedupe2.java:2546)
at jgi.Dedupe2$ClusterThread.run(Dedupe2.java:1966)
Exception in thread "Thread-112" java.lang.AssertionError
at jgi.Dedupe2$Overlap.flip(Dedupe2.java:3992)
at jgi.Dedupe2$ClusterThread.canonicize(Dedupe2.java:2744)
at jgi.Dedupe2$ClusterThread.canonicizeNeighbors(Dedupe2.java:2609)
at jgi.Dedupe2$ClusterThread.canonicizeClusterBreadthFirst(Dedupe2.java:2546)
at jgi.Dedupe2$ClusterThread.run(Dedupe2.java:1966)
Exception in thread "Thread-134" java.lang.AssertionError
at jgi.Dedupe2$Overlap.flip(Dedupe2.java:3992)
at jgi.Dedupe2$ClusterThread.canonicize(Dedupe2.java:2744)
at jgi.Dedupe2$ClusterThread.canonicizeNeighbors(Dedupe2.java:2609)
at jgi.Dedupe2$ClusterThread.canonicizeClusterBreadthFirst(Dedupe2.java:2546)
at jgi.Dedupe2$ClusterThread.run(Dedupe2.java:1966)
Exception in thread "Thread-116" java.lang.AssertionError
at jgi.Dedupe2$Overlap.flip(Dedupe2.java:3992)
at jgi.Dedupe2$ClusterThread.canonicize(Dedupe2.java:2744)
at jgi.Dedupe2$ClusterThread.canonicizeNeighbors(Dedupe2.java:2609)
at jgi.Dedupe2$ClusterThread.canonicizeClusterBreadthFirst(Dedupe2.java:2546)
at jgi.Dedupe2$ClusterThread.run(Dedupe2.java:1966)
Exception in thread "Thread-138" java.lang.AssertionError
at jgi.Dedupe2$ClusterThread.canonicize(Dedupe2.java:2750)
at jgi.Dedupe2$ClusterThread.canonicizeNeighbors(Dedupe2.java:2609)
at jgi.Dedupe2$ClusterThread.canonicizeClusterBreadthFirst(Dedupe2.java:2546)
at jgi.Dedupe2$ClusterThread.run(Dedupe2.java:1966)
Exception in thread "Thread-106" java.lang.AssertionError
at jgi.Dedupe2$Overlap.flip(Dedupe2.java:3992)
at jgi.Dedupe2$ClusterThread.canonicize(Dedupe2.java:2744)
at jgi.Dedupe2$ClusterThread.canonicizeNeighbors(Dedupe2.java:2609)
at jgi.Dedupe2$ClusterThread.canonicizeClusterBreadthFirst(Dedupe2.java:2546)
at jgi.Dedupe2$ClusterThread.run(Dedupe2.java:1966)
Exception in thread "Thread-117" java.lang.AssertionError
at jgi.Dedupe2$Overlap.flip(Dedupe2.java:3992)
at jgi.Dedupe2$ClusterThread.canonicize(Dedupe2.java:2744)
at jgi.Dedupe2$ClusterThread.canonicizeNeighbors(Dedupe2.java:2609)
at jgi.Dedupe2$ClusterThread.canonicizeClusterBreadthFirst(Dedupe2.java:2546)
at jgi.Dedupe2$ClusterThread.run(Dedupe2.java:1966)
Flipped 7948 reads and 11537 overlaps.
Found 0 clusters (0 overlaps) with contradictory orientation cycles.
Found 45 clusters (1951 overlaps) with remaining cycles.
After processing clusters:
Clusters: 3930074 (0 of at least size 10)
Size Range Clusters Reads Bases
1 3930045 3930045 3554307901
2 29 1028 3147151
Largest: 601
Finished processing. Time: 0.853 seconds.
Memory: max=514501m, free=387924m, used=126577m
Input: 3947466 reads 3668075300 bases.
Overlaps: 18834 reads (0.48%) 61778904 bases (1.68%) 85039 collisions.
Result: 3947466 reads (100.00%) 3668075300 bases (100.00%)
Printed output. Time: 11.295 seconds.
Memory: max=514501m, free=387408m, used=127093m
Time: 22.179 seconds.
Reads Processed: 3947k 177.98k reads/sec
Bases Processed: 3668m 165.39m bases/sec
Could you help me to figure out where the problem is? If I can't get it to work I have to revert to a much slower method like using nucmer.
Thanks!