Sequence Cleaner - Browse Files at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size
README	2014-06-27	1.3 kB
sequence_cleaner.py	2014-06-27	1.9 kB
Totals: 2 Items		3.2 kB

####### Description:

Analyzing poor data takes CPU time and interpreting the results from poor data takes people time, so it's always important to make a preprocessing.

Let me call my script as Sequence_cleaner and the big idea is to remove duplicate sequences, remove too short sequences ( the user defines the minimum length) and remove sequences which have too many unknown nucleotides (N) ( the user defines the % of N is allows ) and in the end the user can choose if he/she wants to have a file as output or print the result. 

####### Usage:
Using command line, you should run python sequence_cleaner.py INPUT-(1st) MIN_LENGHT-(2nd) MIN_%-(3rd) - there are 3 basic parameters:

        #1st: your fasta file 
        #2nd: the user defines the minimum length (default value 0 (It means you don't have to care about the minimum length)
        #3rd: the user defines the % of N is allowed (default value 100 (all sequences with 'N' will be in your ouput), 
              set value to 0 if you want no sequences with "N" in your output)

        For exemple: python sequence_cleaner.py Aip_coral.fasta 10 10

FYI: if you don't care about the 2nd and the 3rd parameters, you are only going to remove the duplicate sequences.


Questions, Suggestions or Improvement

Send an email to genivaldo.gueiros@gmail.com

Source: README, updated 2014-06-27

Sequence Cleaner Files

Get an email when there's a new version of Sequence Cleaner