Download Latest Version MethyMer.example.results.tar.gz (536.5 kB)
Email in envelope

Get an email when there's a new version of MethyMer

Home
Name Modified Size InfoDownloads / Week
example.results 2017-08-14
README.md 2017-08-15 5.1 kB
MethyMer.example.results.tar.gz 2017-08-14 536.5 kB
Totals: 3 Items   541.5 kB 0

MethyMer. Quick Start Guide

Design of combinations of specific primers for bisulfite sequencing of complete CpG islands

Short Description

MethyMer is a Python-based tool aimed at selecting primers for amplification of complete CpG islands. These regions are difficult in terms of selecting appropriate primers because of their low-complexity, polyN- and CG-richness. Moreover, bisulfite treatment in fact leads to the reduction of 4-letter alphabet (ATGC) to 3-letter one (ATG, except for methylated cytosines) and this also reduces region complexity and increases mispriming potential.

  • picking up optimal combination of PCR primer pairs with minimal overlap in order to perform amplification of large genomic loci, e.g. CpG islands

  • flexible and tweakable scoring system to optimize the balance between various characteristics such as nucleotide composition, thermodynamic features, presence of CpG sites in primers, and other parameters

  • primer specificity test based on bowtie alignment

  • integrated ENCODE genome annotation data (promoter/enhancer/insulator, etc.)

  • integrated CpG methylation data for 20 tumor types derived from The Cancer Genome Atlas (TCGA)

  • integrated TCGA RNA-Seq – CpG methylation associations study data

  • visualization: standard plots (CG content, score plot, specificity), ENCODE genome annotation, TCGA data

Prerequisites

MethyMer requires Python 3.5 or later with the following packages:

  • BioPython
  • Numpy
  • SciPy
  • Python Imaging Library (PIL)
  • openpyxl

To install these dependencies, type

pip install numpy openpyxl biopython Pillow scipy

Or install SciPy from GitHub

pip install git+http://github.com/scipy/scipy/

If you have both Pythons installed and want to install this for Python3

python3 -m pip install numpy openpyxl biopython Pillow
python3 -m pip install git+http://github.com/scipy/scipy/

Alternatively, download Anaconda3 python with all required packages and lots of other useful stuff

Startup

MethyMer stores all parameters in the Excel workbook called PrimerParameters.xlsx.

1. Adjust scoring parameters or accept them by default

Scoring of component is described with three parameters:

Desired value - values that fit this requirement will not bring any penalty Limit value - if this requirement is violated, primer/pair is completely discarded Penalty - if a parameter fits Limit value but do not passes Desired threshold, a penalty will be calculated with the following way.

If Primer desired max. length = 26, Primer limit max. length = 33, Primer max. length penalty = 2, but primer real length is equal to 29 this penalty component will be calculated as:

penalty = (29-26) x 2 = 6

In the same way, all other parameters

2. Pay a special attention to the amplicon length parameter

This is critical aspect that strongly depends on your bisulfite conversion protocol. Typically, maximum acceptable amplicon length vary from 200 to 500 bp. Usage of excessive amplicon length will result to PCR failure. Thus, set appropriate desired and limit amplicon length.

3. Download genome sequence file

Human Genome assembly hg19 is preferred to hg38 since ENCODE genome annotation and TCGA CpG methylation are only available for hg19 assembly. The latest Ensembl hg19 release is 75 and can be downloaded here:

wget ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
gunzip Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz

Then fill Genome File Name field in PrimerParameters.xlsx with the appropriate file name

4. Specify genomic regions of interest

Look at sheet "target regions" in PrimerParameters.xlsx workbook and specify your genomic regions. You can lookup up them in UCSC Genome Browser. Don't forget to check genome assembly version (hg19/hg38)

5. Run MethyMer

Go to program folder and just type

python3 MethyMer.py

MethyMer will read parameters in PrimerParameters.xlsx workbook and begin primer picking. MethyMer workflow is split to 3 phases: picking primers and pairs, specificity test, and identifying optimal combination of pairs. After each step, MethyMer stores the derived results on the disk (in ./MethyMer_DB folder by default). Next time, the process of analysis will be started from the last incomplete step.

A single HTML report will be generated for all target loci (named like "Run_2017-08-13__UTC.09-07-35.html")

Typically, analysis of one locus (~2-3 kb length) without specificity test requires about 30..60 minutes. Specificity test may increase the analysis process to 1-3 hours.

Source: README.md, updated 2017-08-15