Creating databases and running GAP4 from command-line?

  • Jake

    Jake - 2014-01-24

    Hi, I am new to using the Staden package but have had success using pregap4 and gap4 GUIs to assist in sequencing experiments. In fact, the package is so useful that I want to incorporate my pregap4/gap4 steps into an pipeline that is run from the command line.

    I have been able to use pregap4 with several parameters in order to mirror my GUI usage with identical results. My command is as follows: pregap4 -nowin -config pregap4.config -- sample.contigs

    However, now I want to do the same with gap4. Thus far I am only able to do something like: gap4 -maxseq 100000 -maxdb 80000 database_name.0

    This then opens the gap4 GUI and wants me to select the rest of my parameters from there, but this breaks my CLI-only pipelines. Can someone please assist me in creating the correct command to run gap4 with my necessary parameters? Here is what I want:

    • Create a new database with name: "Sample"
    • Choose consensus algorithm: "Base frequencies"
    • Conduct: "Normal shotgun assembly" with the following specifications:
      • Input reading filenames from file: "pregap.passed"
      • Save failures to: "fails_temp.txt"
    • Once the assembly is finished, I'd like to save the consensus assembly as "Sample_consensus"

    These parameters are trivial to input when using the GUI, but I can't figure out a way to do the same as a batch job on the command line. Any help is much appreciated!

  • James Bonfield

    James Bonfield - 2014-01-27


    Pregap4 has some rudimentary abilities to create databases and perform shotgun assembly, although to be frank Gap4's assembler isn't particularly good even on old style capillary data compared to, say, Phrap. (Where it won out was the ability to incrementally add new data to an existing on-going project.)

    To set the consensus algorithm the easiest way is to modify your .gaprc file (either in ~/.gaprc as defaults for all your work; current working directory .gaprc to change gap4's defaults in just this directory; or $STADENROOT/share/staden/etc/gaprc to change the master defaults used by all users). The relevant defines are, with defaults:

    set_def CONSENSUS_MODE 2
    set_def CONSENSUS_CUTOFF 0.02
    set_def QUALITY_CUTOFF 0

    I believe consensus mode 0 is the base frequencies one. The easiest way to check, and also to configure gap4, is to start up gap4, go to Options -> Consensus algorithm, then edit the parameters and hit "OK Permanent". That will modify your ~/.gaprc file with the settings you just selected so they will become the defaults for new gap4 sessions.

    Saving the consensus automatically from a script is trickier as we don't have a pre-written tool for that. However Gap4 (and gap5) do have their own internal scripting language based on Tcl. Partial documentation for gap4's scripting functions are listed at with being a relevant starting point.

    You may wish to copy and edit the $STADENROOT/share/staden/tcl/pregap4/modules/gap4_assemble.p4m file and look at the "run" procedure. You'll see:

    # Do the assembly
    set e [catch {set result [assemble_shotgun \
        -io $io \
        -files $tfiles \
        -output_mode 1 \
        -min_match $min_match \
        -max_pads $max_pads \
        -max_pmismatch $max_pmismatch \
        -enter_failures [expr 1-$enter_all]]} err]
    if {$e} {
        append report "ERR: Failed to assemble; \"$err\"\n"

    Shortly after that you could call something like get_consensus -io $io ... to save your consensus too. Hopefully it will be relatively obvious how to proceed.


    PS. We're doing no new development on Gap4 now as all our new work is in Gap5. Gap4 is probably still the best for small scale capillary projects, but it may be worth investigating gap5 too.


Log in to post a comment.