Difference Clipping and misassemblies

elmaccco
2007-10-01
2013-04-18
  • elmaccco
    elmaccco
    2007-10-01

    Hi!

    I'm experiencing some weird behaviour in pregap4/gap4, which I think could be due the Post-assembly Difference Clipping option in the Enter Assembly (into Gap4) module in Pregap4, which is selected by default. It happens for all versions between Staden 2002 and 1-7-0.
    I always assemble with Pregap's default Phrap values (Min exact match = 12, Min SWAT score = 30). Many contigs in the resulting assembly have reads that are mistakingly joined (sometimes only in one nucleotide, which also is an incorrect join!). The remaining badly alligned joining sequence is masked out, even though it's of good quality.

    When I de-select the same diff-clip option in pregap, the reads are still joined, but at least now you can see how badly joined they are, since they are un-masked. Thus, such joins need to be manually broken up and that's not a fun task to do...;)

    A picture says more than a thousand words so please have a look at this screen-dump, where the top contig-editor is for the diff-clip assembly and the bottom for the no-diff-clip assembly.

    ftp://morpheus.ucc.ie/pub/missassembly.jpg

    Very grateful for any help on this!

    Marcus

     
    • elmaccco
      elmaccco
      2007-10-02

      The link to the picture has now changed to
      http://bioinfo.ucc.ie/temp/missassembly.jpg
      Marcus

       
    • I get that too. With that and the inability to use 1.7 on Mac OS X, I have given Staden away entirely.

       
      • elmaccco
        elmaccco
        2007-10-30

        I contacted Staden-developer James Bonfield who replied this to me:

        "It appears to be a gcphrap bug and not a Gap4 bug. Hence it'll fail
        regardless what version of the Staden Package you use.

        Phrap itself is probably working ok, just not gcphrap. If you use
        Gap4's native assembler then it works fine (albeit not a particularly
        nice assembly).

        Phrap also in "-new_ace" mode produces valid looking assemblies, but
        the gcphrap code looks like it gets confused by quality clipping the
        left-most reading in the contig as this is how far the sequence is out
        of alignment. I must admit I'm unsure how phrap can clip a read end
        when it doesn't overlap anything, but indeed it has done so.

        The "phrap_extras" package to turn phrap into gcphrap was indeed
        written by me, but unfortunatley it's no longer supported either. Here
        we use phrap2gap instead which is a much better alternative utilising
        caf files instead of thousands of .exp files. Look at the left hand
        side bar at http://www.sanger.ac.uk/Software/formats/CAF/ for source
        code for phrap2gap, caftools and caf2gap."

        After trying out these CAF tools I saw that those missasseblies had disappeared. However, some new similar-looking appeared at other sites, although not as frequent. It would be great if the gcphrap bug was fixed though, so that pregap could be used as a whole without this work-around...

        Marcus