Menu

#5 dedupe.sh/dedupe2.sh errors overlapping sequences

1.0
open
None
2018-02-13
2018-02-13
fungs
No

Hi Brian,

first let's say that I really like your software, it's speed and versatility always impresses me, although I often have no idea what's going on under the hood.

Now, I'm trying to overlap sequences (form clusters) in a large (~1.GiB, uncompressed) FASTA file. When I run the following command, I get lots of messages including how sequences as output on stderr.

When I run

**dedupe2.sh in=contigs_01.fna.gz,contigs_02.fna.gz,contigs_03.fna.gz,contigs_04.fna.gz,contigs_05.fna.gz,contigs_06.fna.gz minclustersize=2 am=f ac=f fo=t cc=t rnc=f mcs=10 mo=300 minidentity=99 processclusters=t fixmultijoins=f pto=f mst=f pattern=clusters5/contigs_dedup+mh.cluster_%.fna.gz qin=33 dot=clusters5/graph.dot -Xmx500g -eoom**

**dedupe.sh --version**
BBMap version 37.90

**java -version**
openjdk version "1.8.0_121"
OpenJDK Runtime Environment (Zulu 8.20.0.5-linux64) (build 1.8.0_121-b15)
OpenJDK 64-Bit Server VM (Zulu 8.20.0.5-linux64) (build 25.121-b15, mixed mode)

**java parameters**: -Djava.library.path=/path/to/bbmap-37.90/jni/ -ea -Xmx500g -Xms500g -cp /path/to/bbmap-37.90/current/ jgi.Dedupe2

This is the kind of output:

>1
CCCTA...
>2
ATCCG...

        at jgi.Dedupe2$Overlap.<init>(Dedupe2.java:3852)
        at jgi.Dedupe2$Unit.makeOverlapReverse(Dedupe2.java:5096)
        at jgi.Dedupe2$Unit.makeOverlap(Dedupe2.java:4685)
        at jgi.Dedupe2$HashThread.findOverlaps(Dedupe2.java:3292)
        at jgi.Dedupe2$HashThread.processRead(Dedupe2.java:3156)
        at jgi.Dedupe2$HashThread.processReadOuter(Dedupe2.java:3034)
        at jgi.Dedupe2$HashThread.run(Dedupe2.java:2969)
Exception in thread "Thread-62" java.lang.AssertionError: 
type=FORWARD, len=316, subs=198, edits=0 (p_410, start1=127500, stop1=127815) (p_486, start2=0, stop2=315)

This output repeats for several sequences pairs labeled 1 and 2, and then another block at the end:

        at jgi.Dedupe2$ClusterThread.canonicizeNeighbors(Dedupe2.java:2609)
        at jgi.Dedupe2$ClusterThread.canonicizeClusterBreadthFirst(Dedupe2.java:2546)
        at jgi.Dedupe2$ClusterThread.run(Dedupe2.java:1966)
Exception in thread "Thread-101" java.lang.AssertionError
        at jgi.Dedupe2$ClusterThread.findMultiJoinsInCluster(Dedupe2.java:2160)
        at jgi.Dedupe2$ClusterThread.run(Dedupe2.java:1934)
Exception in thread "Thread-122" java.lang.AssertionError
        at jgi.Dedupe2$Overlap.flip(Dedupe2.java:3992)
        at jgi.Dedupe2$ClusterThread.canonicize(Dedupe2.java:2744)
        at jgi.Dedupe2$ClusterThread.canonicizeNeighbors(Dedupe2.java:2609)
        at jgi.Dedupe2$ClusterThread.canonicizeClusterBreadthFirst(Dedupe2.java:2546)
        at jgi.Dedupe2$ClusterThread.run(Dedupe2.java:1966)
Exception in thread "Thread-112" java.lang.AssertionError
        at jgi.Dedupe2$Overlap.flip(Dedupe2.java:3992)
        at jgi.Dedupe2$ClusterThread.canonicize(Dedupe2.java:2744)
        at jgi.Dedupe2$ClusterThread.canonicizeNeighbors(Dedupe2.java:2609)
        at jgi.Dedupe2$ClusterThread.canonicizeClusterBreadthFirst(Dedupe2.java:2546)
        at jgi.Dedupe2$ClusterThread.run(Dedupe2.java:1966)
Exception in thread "Thread-134" java.lang.AssertionError
        at jgi.Dedupe2$Overlap.flip(Dedupe2.java:3992)
        at jgi.Dedupe2$ClusterThread.canonicize(Dedupe2.java:2744)
        at jgi.Dedupe2$ClusterThread.canonicizeNeighbors(Dedupe2.java:2609)
        at jgi.Dedupe2$ClusterThread.canonicizeClusterBreadthFirst(Dedupe2.java:2546)
        at jgi.Dedupe2$ClusterThread.run(Dedupe2.java:1966)
Exception in thread "Thread-116" java.lang.AssertionError
        at jgi.Dedupe2$Overlap.flip(Dedupe2.java:3992)
        at jgi.Dedupe2$ClusterThread.canonicize(Dedupe2.java:2744)
        at jgi.Dedupe2$ClusterThread.canonicizeNeighbors(Dedupe2.java:2609)
        at jgi.Dedupe2$ClusterThread.canonicizeClusterBreadthFirst(Dedupe2.java:2546)
        at jgi.Dedupe2$ClusterThread.run(Dedupe2.java:1966)
Exception in thread "Thread-138" java.lang.AssertionError
        at jgi.Dedupe2$ClusterThread.canonicize(Dedupe2.java:2750)
        at jgi.Dedupe2$ClusterThread.canonicizeNeighbors(Dedupe2.java:2609)
        at jgi.Dedupe2$ClusterThread.canonicizeClusterBreadthFirst(Dedupe2.java:2546)
        at jgi.Dedupe2$ClusterThread.run(Dedupe2.java:1966)
Exception in thread "Thread-106" java.lang.AssertionError
        at jgi.Dedupe2$Overlap.flip(Dedupe2.java:3992)
        at jgi.Dedupe2$ClusterThread.canonicize(Dedupe2.java:2744)
        at jgi.Dedupe2$ClusterThread.canonicizeNeighbors(Dedupe2.java:2609)
        at jgi.Dedupe2$ClusterThread.canonicizeClusterBreadthFirst(Dedupe2.java:2546)
        at jgi.Dedupe2$ClusterThread.run(Dedupe2.java:1966)
Exception in thread "Thread-117" java.lang.AssertionError
        at jgi.Dedupe2$Overlap.flip(Dedupe2.java:3992)
        at jgi.Dedupe2$ClusterThread.canonicize(Dedupe2.java:2744)
        at jgi.Dedupe2$ClusterThread.canonicizeNeighbors(Dedupe2.java:2609)
        at jgi.Dedupe2$ClusterThread.canonicizeClusterBreadthFirst(Dedupe2.java:2546)
        at jgi.Dedupe2$ClusterThread.run(Dedupe2.java:1966)
Flipped 7948 reads and 11537 overlaps.
Found 0 clusters (0 overlaps) with contradictory orientation cycles.
Found 45 clusters (1951 overlaps) with remaining cycles.

After processing clusters:
Clusters:         3930074 (0 of at least size 10)

Size Range        Clusters          Reads             Bases
1                 3930045           3930045           3554307901
2                 29                1028              3147151

Largest:          601
Finished processing.       Time: 0.853 seconds.
Memory: max=514501m, free=387924m, used=126577m

Input:                          3947466 reads           3668075300 bases.
Overlaps:                       18834 reads (0.48%)     61778904 bases (1.68%)          85039 collisions.
Result:                         3947466 reads (100.00%)         3668075300 bases (100.00%)

Printed output.            Time: 11.295 seconds.
Memory: max=514501m, free=387408m, used=127093m

Time:                           22.179 seconds.
Reads Processed:       3947k    177.98k reads/sec
Bases Processed:       3668m    165.39m bases/sec

Could you help me to figure out where the problem is? If I can't get it to work I have to revert to a much slower method like using nucmer.

Thanks!

Discussion


Log in to post a comment.

MongoDB Logo MongoDB