editing regions of sequences

  • coraltech

    coraltech - 2010-11-22

    I was directed to the Windows 1.70 version of the staden package for quick and easy identification and selection of specific bp regions in a sequence.  I have a list of several hundred sequences to get through and have been using Microsoft Word's "Word Count" feature to identify/select the regions, e.g. 530-1078 bp in a sequence with ~2000 bases, but know there is a better way of going about this task.  Can you tell me which staden program is best to use for this purpose and how to input my list of several hundred sequences ?  Thank you! 

  • James Bonfield

    James Bonfield - 2010-11-24

    It sounds like the task you're doing isn't really a sequence assembly but more a job of extracting sections of individual sequences. This isn't really the sort of thing that Gap4 was designed around, and Spin (now sadly not really supported) isn't so good as managing large lists of sequences either.

    Perhaps you'll find something more suited in the EMBOSS package: http://emboss.sourceforge.net/


  • Nobody/Anonymous

    I think you were misdirected somewhat - as James said, I think Gap4 is a bit overkill for your task, and actually not that well suited.

    There are so many options (less so on Windows) to do what you want. But as you want to repeat the process 100s of times, you need to script it (automate) somehow. The EMBOSS "extractseq" tool can do this:

    % extractseq -sequence longone.fasta -regions 530-1078 -outseq shortone.fasta

    It can even do all the 100 ones you want, and put them into separate entries in the output file:

    % extractseq -sequence longone.fasta -separate Y -regions 530-1078,660-703,101-234,300-1501 -outseq shortones.fasta

  • Torsten Seemann

    Torsten Seemann - 2010-11-24

    Whoops, sourceforge logged me out, the above post 3. was me :)


Log in to post a comment.