Home
Name Modified Size Downloads / Week Status
Totals: 2 Items   9.3 kB 2
reformatfasta 2012-11-28 6.5 kB 11 weekly downloads
readme.txt 2012-11-28 2.8 kB 11 weekly downloads
reformatfasta 0.7 ================= This Perl program is a simple command-line utility to: - reformat a multi-sequence FASTA file to have a certain number of bases per line; - get rid of empty lines; - convert from DNA to RNA (T to U), optional; - convert all to uppercase, optional; - sort records, based on either NCBI taxon or the sequence identifier, optional; - "clean" sequences by removing spaces, X (or x), * and - characters, optional. Installation: None, really. The program is ready to use. Just place it in a directory on your path and it should be visible anywhere in your system. Otherwise, just run it in the directory where you downloaded it, by running the program preceded by a "./", like this, for example: ./reformatfasta -h Usage: reformatfasta -i input_file -o output_file [-s integer -u -c -d -b blast_output_table] Program to: - reformat a multi-sequence FASTA file to have a certain number of bases per line; - get rid of empty lines; - convert from DNA to RNA (T to U), optional; - convert all to uppercase, optional; - sort records, based on either NCBI taxon or the sequence identifier, optional; - when NCBI taxon is the same, sort by E-value in BLAST table, optional; - filter sequences by either minimum or maximum size, or both; - "clean" sequences by removing spaces, X (or x), * and - characters, optional. * Input and output files are optional; the program can be used as a traditional UNIX filter, using the redirection symbols (< and >) of standard input and output. Options: -i Input FASTA file; -o Output file (reformatted sequences); -f n Remove sequences shorter than "n" (default: no); -x n Remove sequences longer than "x" (default: no); -m Maintain sequence lengths per line (and blank lines) as found (default: reformat); -j Put the record's sequence all in one line (default: no); -d Sort records by sequence identifier (default: no sorting); -n Sort records by NCBI taxon name, if available in header -- for records without that information, sorting is done by sequence identifier instead (default: no sorting); -b Secondary sort by E-value in BLAST results in table format, i.e. -m 8 or -outfmt 6 (default: no); -p Appends E-value of -b sort to the FASTA definition line, after a tab (default: no); -k Convert all to uppercase, only for nucleic acids (default: no); -s n Number of bases/aminoacids per line (default: 60); -c "Clean" sequences (default: do not "clean"); -u Convert U to T (default: keep as is); -h Displays this help message and exits; -v Displays program version and exits. Copyright J.M.P. Alves 2006-2012 (alvesjmp@yahoo.com) This software is licensed under the GNU General Public License v. 3. Please see http://www.fsf.org/licensing/licenses/gpl.html for details.
Source: readme.txt, updated 2012-11-28