filtering.py (sourceforge.net)
Quality filtering:
This script finds records that are less than a certain length (specified by the user), empty records (no sequence), sequences which do not have the DNA alphabet (ambiguous or unambiguous), and excludes them from the final output file. The final number of records included in the output file is counted and the number specified in the output.
Test files associated with this script:
[testfile1_filtering.fa] testing for DNA alphabet with records consisting of protein alphabet and "N"s
[testfile2_filtering.txt] is a text file in a non-fasta format
[testfile3_filtering.fa] containing one record of >100bp and another <100bp
Wiki: Taxonomy assignment
Wiki: testfile1_filtering.fa
Wiki: testfile2_filtering.txt
Wiki: testfile3_filtering.fa