Overlap store failure
When overlaps are computed, they are placed into intermediate files. To save space these files are gzip compressed at compression level 1. After all overlaps are computed, they are sorted and placed into a binary data store.
Two failures are common here: running out of disk space, and corrupted gzip files. A third failure can occur in assemblies with many fragments when runCA is used with default parameters.
Overlap stores are created using the overlapStore command.
Out of disk space
Corrupt gzip files / too many fragments
The error reported by runCA is the same for both a corrupt input file and too many fragments.
bucketizing /work/assembly/godzilla/0-overlaptrim-overlap/002/001025.ovb.gz bucketizing /work/assembly/godzilla/0-overlaptrim-overlap/002/001851.ovb.gz Too many bucket files when adding overlap: 287166353 280000507 f 271 386 0 116 1.740000 This might be a corrupt input file, or maybe you simply need to supply more memory with the runCA option ovlStoreMemory.
The 'gzip' command can be used to test if the input file (0010851.ovb.gz) is corrupt: gzip --test 001851.ovb.gz. If the file is corrupt, remove it, and recompute overlaps with sh overlap.sh 1851. It is advisable to test the remaining files for corruption.
Usually when the file is corrupt the values reported in the overlap make no sense, for example, very high error rates, or positions/hangs that exceed the fragment length. Values are documented on the overlapStore command page, and depend on the type of overlap being processed.
The first two numbers on the line are always the fragment IID. If both of these are less than the number of fragments in the assembly, it is likely that you need to increase the ovlStoreMemory parameter.