A command-line utility to manipulate biological sequences from a multi-FASTA file. It can, given a list of identifiers, get only a subset of the sequences (or their complement, i.e., sequences NOT in the list). Can also get sequence number N only.
- collect only some sequences out of a large multi-FASTA file
- get sequence number N only, regardless of ID
- complement mode: return all sequences that are NOT in the list of IDs
- "matching" mode: choose which part (between | characters) of the ID should match
- sequence names provided one per line in a text file (first word in line used, or whatever is given to the -k option)
- the > symbol is ignored if it is present in the beginning of IDs in the list (useful if using FASTA identifiers)
- if only one sequence is needed, its ID can be given directly to the -l option (no need of a file)
- add a suffix to IDs before searching (useful when IDs come from proteins that have _1 in the ID, but genes do not)
- compressed sequence database files (-s) are supported
- quite mode, output only important warnings and errors
Be the first to post a review of selectseq!