A simple command-line utility to calculate biological sequence (DNA or protein) sizes in a (multi) FASTA file. It gives averages, GC (or methionine) content, N50, N90, N95, number of N's, and total bases, and can also report by codon if requested.
Features
- sequence sizes (DNA or protein)
- GC content, in percentage (for each sequence and overall weighted average)
- methionine content, in absolute number and percentage (protein only)
- codon GC content (DNA only)
- multi FASTA input files
- reports average sequence size, total nucleotides, N50, N90, and N95
- by default, report shows sequence names sorted in descending size order
- report is tab-delimited text with results from one FASTA entry per line
- gzip-compressed input supported
Categories
Bio-InformaticsLicense
GNU General Public License version 3.0 (GPLv3)Follow mfsizes
Other Useful Business Software
Gen AI apps are built with MongoDB Atlas
MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of mfsizes!