Menu

Tutorial

Ramprasad Neethiraj

MESPA Tutorial

Here you will find a walkthrough on how to use MESPA for building gene-models. This will help you get working with MESPA and validate your installation.

1. Requirements:

see Manual

2. Walkthrough

1. Example data download

Please download the MESPA script and the example datasets (mespa_walkthrough12aug17.zip) from here.

This archive contains the following files,
AsmQC.pl
mespa_17aug15.py
mespa.cfg
protein_example.fa
read1.fq
read2.fq

2. Dataset

We will be making a genome assembly with a pool-seq dataset from Drosophila melanogaster (Fabian ref) and use this assembly to build gene-models. For our convenience, we will only be using a small subset of reads. The subset (read1.fq and read2.fq) was obtained by extracting all the reads that map to contigs containing genes from an initial assembly that we generated in house.

3. Data preparation

In the terminal/command prompt create a new directory for the tutorial, navigate to this directory, uncompress the downloaded archive into this directory.

mkdir mespa_testrun
cd mespa_testrun/
mv /folder_containing archive/mespa_walkthrough12aug17.zip .
unzip mespa_walkthrough12aug17.zip

4. Generating Genome Assembly

Please feel free to use a genome assembler of your choice but here, for demonstration purposes, we will be using velvet v1.2.10 to generate our assembly.

(In this tutorial we assume that all necessary softwares are present in the system path)

From terminal run,

velveth Assem 31 -fastq -shortPaired -separate read1.fq read2.fq

Upon successful completion of velveth, your working directory will contain the folder Assem_31 and several files in this folder. These files will be used as input by velvetg. Now run,

velvetg Assem_31 -cov_cutoff auto -read_trkg yes -min_contig_lgth 200 -exp_cov auto

Once velvetg is done, you should have the file contigs.fa generated in the folder Assem_31. This file, as the name implies, contains your genomic contigs, to be used with MESPA.

5. Running MESPA

In the config file change the paths and settings to suit your local computer.

Once this is done, run MESPA from the command line by typing

python mespa_17aug15.py -a Assem_31/contigs.fa -p protein_example.fa -c 4 -s mespa_v1.cfg

If the run was successful, you should find the files assembly_edited.fa, scaffolds.mfa, only_genemodels.fa, remaining_contigs.fa, scaffolds.gff and output_summary.txt and the folder mespa_temp, with the intermediate files, in the directory where you ran MESPA.

NOTE: For a detailed description of the files please refer the section MESPA output above..


Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.