Get specific sequences from a multi-FASTA file.

Add a Review
2 Downloads (This Week)
Last Update:


A command-line utility to manipulate biological sequences from a multi-FASTA file. It can, given a list of identifiers, get only a subset of the sequences (or their complement, i.e., sequences NOT in the list). Can also get sequence number N only.

selectseq Web Site


  • collect only some sequences out of a large multi-FASTA file
  • get sequence number N only, regardless of ID
  • complement mode: return all sequences that are NOT in the list of IDs
  • matching mode: the whole ID in the sequences file does not need to match completely (useful when one has GenBank files but lists of accession number)
  • sequence names provided one per line in a text file (first word in line used)
  • the > symbol is ignored if it is present in the beginning of IDs in the list (useful if using FASTA identifiers)
  • if only one sequence is needed, its ID can be given directly to the -l option (no need of a file)
  • add a suffix to IDs before searching (useful when IDs come from proteins that have _1 in the ID, but genes do not)


Write a Review

User Reviews

Be the first to post a review of selectseq!

Additional Project Details

Intended Audience


User Interface


Programming Language



Screenshots can attract more users to your project.
Features can attract more users to your project.

Icons must be PNG, GIF, or JPEG and less than 1 MiB in size. They will be displayed as 48x48 images.