editing regions of sequences

coraltech
2010-11-22
2013-04-18
  • coraltech
    coraltech
    2010-11-22

    I was directed to the Windows 1.70 version of the staden package for quick and easy identification and selection of specific bp regions in a sequence.  I have a list of several hundred sequences to get through and have been using Microsoft Word's "Word Count" feature to identify/select the regions, e.g. 530-1078 bp in a sequence with ~2000 bases, but know there is a better way of going about this task.  Can you tell me which staden program is best to use for this purpose and how to input my list of several hundred sequences ?  Thank you! 

     
  • James Bonfield
    James Bonfield
    2010-11-24

    It sounds like the task you're doing isn't really a sequence assembly but more a job of extracting sections of individual sequences. This isn't really the sort of thing that Gap4 was designed around, and Spin (now sadly not really supported) isn't so good as managing large lists of sequences either.

    Perhaps you'll find something more suited in the EMBOSS package: http://emboss.sourceforge.net/

    James

     
  • I think you were misdirected somewhat - as James said, I think Gap4 is a bit overkill for your task, and actually not that well suited.

    There are so many options (less so on Windows) to do what you want. But as you want to repeat the process 100s of times, you need to script it (automate) somehow. The EMBOSS "extractseq" tool can do this:

    % extractseq -sequence longone.fasta -regions 530-1078 -outseq shortone.fasta

    It can even do all the 100 ones you want, and put them into separate entries in the output file:

    % extractseq -sequence longone.fasta -separate Y -regions 530-1078,660-703,101-234,300-1501 -outseq shortones.fasta

     
  • Whoops, sourceforge logged me out, the above post 3. was me :)