reformatfasta 0.7
=================
This Perl program is a simple command-line utility to:
- reformat a multi-sequence FASTA file to have a certain number of bases per line;
- get rid of empty lines;
- convert from DNA to RNA (T to U), optional;
- convert all to uppercase, optional;
- sort records, based on either NCBI taxon or the sequence identifier, optional;
- "clean" sequences by removing spaces, X (or x), * and - characters, optional.
Installation:
None, really. The program is ready to use. Just place it in a directory on your path and it should be
visible anywhere in your system. Otherwise, just run it in the directory where you downloaded it, by
running the program preceded by a "./", like this, for example: ./reformatfasta -h
Usage: reformatfasta -i input_file -o output_file [-s integer -u -c -d -b blast_output_table]
Program to:
- reformat a multi-sequence FASTA file to have a certain number of bases per
line;
- get rid of empty lines;
- convert from DNA to RNA (T to U), optional;
- convert all to uppercase, optional;
- sort records, based on either NCBI taxon or the sequence identifier, optional;
- when NCBI taxon is the same, sort by E-value in BLAST table, optional;
- filter sequences by either minimum or maximum size, or both;
- "clean" sequences by removing spaces, X (or x), * and - characters, optional.
* Input and output files are optional; the program can be used as a traditional UNIX
filter, using the redirection symbols (< and >) of standard input and output.
Options:
-i Input FASTA file;
-o Output file (reformatted sequences);
-f n Remove sequences shorter than "n" (default: no);
-x n Remove sequences longer than "x" (default: no);
-m Maintain sequence lengths per line (and blank lines) as found (default: reformat);
-j Put the record's sequence all in one line (default: no);
-d Sort records by sequence identifier (default: no sorting);
-n Sort records by NCBI taxon name, if available in header -- for records
without that information, sorting is done by sequence identifier instead
(default: no sorting);
-b Secondary sort by E-value in BLAST results in table format, i.e. -m 8 or -outfmt 6
(default: no);
-p Appends E-value of -b sort to the FASTA definition line, after a tab (default: no);
-k Convert all to uppercase, only for nucleic acids (default: no);
-s n Number of bases/aminoacids per line (default: 60);
-c "Clean" sequences (default: do not "clean");
-u Convert U to T (default: keep as is);
-h Displays this help message and exits;
-v Displays program version and exits.
Copyright J.M.P. Alves 2006-2012 (alvesjmp@yahoo.com)
This software is licensed under the GNU General Public License v. 3.
Please see http://www.fsf.org/licensing/licenses/gpl.html for details.