I'm experiencing some weird behaviour in pregap4/gap4, which I think could be due the Post-assembly Difference Clipping option in the Enter Assembly (into Gap4) module in Pregap4, which is selected by default. It happens for all versions between Staden 2002 and 1-7-0.
I always assemble with Pregap's default Phrap values (Min exact match = 12, Min SWAT score = 30). Many contigs in the resulting assembly have reads that are mistakingly joined (sometimes only in one nucleotide, which also is an incorrect join!). The remaining badly alligned joining sequence is masked out, even though it's of good quality.
When I de-select the same diff-clip option in pregap, the reads are still joined, but at least now you can see how badly joined they are, since they are un-masked. Thus, such joins need to be manually broken up and that's not a fun task to do...;)
A picture says more than a thousand words so please have a look at this screen-dump, where the top contig-editor is for the diff-clip assembly and the bottom for the no-diff-clip assembly.
Very grateful for any help on this!
The link to the picture has now changed to
I get that too. With that and the inability to use 1.7 on Mac OS X, I have given Staden away entirely.
I contacted Staden-developer James Bonfield who replied this to me:
"It appears to be a gcphrap bug and not a Gap4 bug. Hence it'll fail
regardless what version of the Staden Package you use.
Phrap itself is probably working ok, just not gcphrap. If you use
Gap4's native assembler then it works fine (albeit not a particularly
Phrap also in "-new_ace" mode produces valid looking assemblies, but
the gcphrap code looks like it gets confused by quality clipping the
left-most reading in the contig as this is how far the sequence is out
of alignment. I must admit I'm unsure how phrap can clip a read end
when it doesn't overlap anything, but indeed it has done so.
The "phrap_extras" package to turn phrap into gcphrap was indeed
written by me, but unfortunatley it's no longer supported either. Here
we use phrap2gap instead which is a much better alternative utilising
caf files instead of thousands of .exp files. Look at the left hand
side bar at http://www.sanger.ac.uk/Software/formats/CAF/ for source
code for phrap2gap, caftools and caf2gap."
After trying out these CAF tools I saw that those missasseblies had disappeared. However, some new similar-looking appeared at other sites, although not as frequent. It would be great if the gcphrap bug was fixed though, so that pregap could be used as a whole without this work-around...