Home
Name Modified Size InfoDownloads / Week
reformatfasta 2012-11-28 6.5 kB
readme.txt 2012-11-28 2.8 kB
Totals: 2 Items   9.3 kB 0
reformatfasta 0.7
=================

This Perl program is a simple command-line utility to:
- reformat a multi-sequence FASTA file to have a certain number of bases per line;
- get rid of empty lines;
- convert from DNA to RNA (T to U), optional;
- convert all to uppercase, optional;
- sort records, based on either NCBI taxon or the sequence identifier, optional;
- "clean" sequences by removing spaces, X (or x), * and - characters, optional.

Installation:

None, really. The program is ready to use. Just place it in a directory on your path and it should be 
visible anywhere in your system. Otherwise, just run it in the directory where you downloaded it, by
running the program preceded by a "./", like this, for example: ./reformatfasta -h


Usage: reformatfasta -i input_file -o output_file [-s integer -u -c -d -b blast_output_table]

Program to:
- reformat a multi-sequence FASTA file to have a certain number of bases per 
  line;
- get rid of empty lines;
- convert from DNA to RNA (T to U), optional;
- convert all to uppercase, optional;
- sort records, based on either NCBI taxon or the sequence identifier, optional;
- when NCBI taxon is the same, sort by E-value in BLAST table, optional;
- filter sequences by either minimum or maximum size, or both;
- "clean" sequences by removing spaces, X (or x), * and - characters, optional. 

* Input and output files are optional; the program can be used as a traditional UNIX
  filter, using the redirection symbols (< and >) of standard input and output.

Options:
-i 	 Input FASTA file;
-o 	 Output file (reformatted sequences);
-f n	 Remove sequences shorter than "n" (default: no);
-x n	 Remove sequences longer than "x" (default: no);
-m 	 Maintain sequence lengths per line (and blank lines) as found (default: reformat);
-j 	 Put the record's sequence all in one line (default: no);
-d 	 Sort records by sequence identifier (default: no sorting);
-n 	 Sort records by NCBI taxon name, if available in header -- for records
   	 without that information, sorting is done by sequence identifier instead
   	 (default: no sorting);
-b 	 Secondary sort by E-value in BLAST results in table format, i.e. -m 8 or -outfmt 6
   	 (default: no);
-p 	 Appends E-value of -b sort to the FASTA definition line, after a tab (default: no);
-k 	 Convert all to uppercase, only for nucleic acids (default: no);
-s n	 Number of bases/aminoacids per line (default: 60);
-c 	 "Clean" sequences (default: do not "clean");
-u 	 Convert U to T (default: keep as is);
-h 	 Displays this help message and exits;
-v 	 Displays program version and exits.

Copyright J.M.P. Alves 2006-2012 (alvesjmp@yahoo.com)
This software is licensed under the GNU General Public License v. 3.
Please see http://www.fsf.org/licensing/licenses/gpl.html for details.
Source: readme.txt, updated 2012-11-28