*** mfsizes v. 1.8.4 - 2013-06-22 - by J
Program to count the size of each sequence in a FASTA file, amount of Ns per sequence,
calculate the total and weighted average size for the total set of sequences, GC% for
each sequence (Met proportion and total, if protein), and N50, 90 and 95. Optionally,
calculate the GC content for each codon position separately.
* Normal usage: mfsizes -i filename [-o outfile] [-p] [-c] [-q] [-e]
-i Input FASTA file, mandatory;
-o Output file (optional);
-p Input sequences are protein (default: DNA);
-c Calculate GC separately for each codon position (default: no);
-e Ignore "empty" sequences (default: no);
-q Quiet output, see below (default: detailed output);
-n Do NOT sort (descrescing) sequences by size (default: sort);
-v Prints program version and exits;
-h Prints this help message and exits.
- Order of options is irrelevant, and switches can be bundled (e.g. -cq);
- If the output file is not specified (-o option), the program will generate an
outfile with the same name of the input file, but with .sizes extension added;
- If your sequences are protein, use the -p switch, otherwise meaningless results will
- For "quiet" output (no general statistics, no comments, just each sequence's ID,
size and GC or Met content), use the -q switch;
- Use -e to ignore "empty" sequences (definition lines after which there is no
sequence) in the output -- these sequences are never used to calculate the averages,
even if not ignored, so it is safe to use the default behavior;
- The -c switches on the separate codon position calculation of GC (output will also
have three GC compositions for each sequence, one for each codon position). This
is ignored if you use -p, and might give meaningless results if sequences are not
protein coding (as well as some warnings or even errors);
- The program will list each sequence's name followed by its size and percent GC
(or %Met and total number of Met). At the end of the file, the average size,
standard deviation and weigthed average GC (or Met).
Copyright J.M.P. Alves 2003-2013 (email@example.com)
This software is licensed under the GNU General Public License v. 3.
Please see http://www.fsf.org/licensing/licenses/gpl.html for details.