The aim of this project is to create an automated pipeline for the taxonomic assignment of DNA sequences from environmental samples. In this study, we focus on DNA markers amplified from benthic sediment samples. Using a series of customized scripts written in python, DNA sequences were edited as follows: short sequence removal, primer pair removal and reversal to the correct orientation. Clean marker sequences were then clustered in operational taxonomic units (OTUs) and matched up against the Genbank database. All sequences and associated data were stored in a biosql relational database, which was then queried to retrieve taxonomy assignments for each cluster. Below is an illustration of the pipeline.
This is the link to the project directory containing the original data, results, Sarah's notebook, Minttu's notebook, python code, Test files and Paper.
This is the link to the python code
Here are links to the documentation, test files, and python codes written for this project:
Wiki: Blast.py
Wiki: abitofasta.py
Wiki: filtering.py
Wiki: findprimer.py
Wiki: loadbioSQL
Wiki: regex6.py