Name | Modified | Size | Downloads / Week |
---|---|---|---|
README.md | 2014-08-27 | 2.5 kB | |
reciprocal_blast.zip | 2014-08-25 | 788.0 kB | |
Totals: 2 Items | 790.5 kB | 0 |
Reciprocal Blast for Windows
Complete set of scripts and applications to perform reciprocal Blast via BLAST+ utilities.
Requirements
The following must be installed on system:
- Microsoft C++ redistributable 2010
- Python 3+ (32 bit version)
- BioPython (available to download via pip)
- NCBI BLAST+ package installed
- xlsxwriter (available to download via pip) to parse procedure output
- Python interpreter py.exe set in PATH environment variable
Installation
Install requirements and simply copy the repository on your harddrive. Then copy executables from BLAST+ pakacge to tools/blast directory.
Usage
Performing reciprocal Blast
-
Put sequences in format .fasta or .fasta.gz. in the project_data directory. Files have to be put in the correct folders following the structure in the project_data -Example directory. Remember to set correct file extensions. Only .fasta or .fasta.gz are accepted.
-
Run GO.bat and enter the new workspace name. This script will manage entire procedure. If you want to do it manually go to point 3.
-
Run batch script process-data.bat. This will mask your protein and nucleotide sequences and copy them to the input_data directory.
-
Copy workspace_template and rename as you please. Remember to copy it to the same directory where workspace_template is.
-
Open your new workspace and run in the following order:
- make_workspace.bat
- run_blast.bat
Tuning data
After creating workspace with script make_workspace.bat you can edit local sequence source files as you wish, since they are only working copy. To create database from updated source sequences run make_blastdb.bat.
Obtaining output
All results are kept in the results directory. They are both in html and table formats. To convert output into more readable excel tables, there is a python utility tool in the path:
tools/mytools/processing/generate_tables.py
and its dependencies in the directory gen_lib. Simply copy the files in processing folder to the workspace and run from command line: py generate_tables.py results/species-name
Contact
This set of scripts was created for a science project. If you wish to see more features like:
- better documentation
- user-friendly interface
- no-nonsense fasta cutter
- automatic download of dependencies
Email me at : m.nowot@gmail.com